Today's Big Data Pulse

Data‑Engineering Bottlenecks Shift From Legacy Tech to Leadership Gaps
Three 2026 surveys of 1,629 data professionals show that weak leadership direction and poor requirements now account for 40% of top‑bottleneck votes, outpacing legacy systems at 25%. By April, 50% of respondents cite lack of clear ownership as the biggest pain point, while better tooling is mentioned by under 5%.
Also developing:
By the numbers: Ampere Analysis acquires PlumResearch
Snowflake’s New Coding Agent Is in a Category of Its Own, Says Head of Product
Snowflake unveiled Cortex Code, an AI‑driven coding assistant embedded in its data platform, at the Build conference in London. The tool can generate, explain, and optimize SQL and Python code, handle data‑engineering and analytics tasks, and is context‑aware of schemas, governance and compute, promising up to a ten‑fold productivity boost for data engineers. The launch coincided with a $200 million partnership with OpenAI and expands Snowflake’s Cortex AI suite, which also includes Cortex Analyst and Cortex Intelligence. Industry forecasts from GlobalData project agentic AI revenues to grow 48% annually, reaching $45.4 bn by 2029, underscoring the market’s appetite for embedded AI tools.
Petra Durnin: You Don't Need More Tech — You Need Better Data
In this episode, Petra Durnin, a veteran CRE researcher and tech‑to‑impact strategist, explains why the industry’s biggest hurdle isn’t more tools but cleaner, more integrated data. She walks through her career trajectory, from a temp analyst to leading data and...

TopQuadrant Launches Enterprise Context Platform to Build Trusted AI
TopQuadrant unveiled TQ Data Foundation, an enterprise context platform designed to close the AI "context gap" that keeps many large firms in pilot mode. The solution layers knowledge‑graph technology over existing data assets, delivering unified models, reference terms, metadata, and...

Adopting a Time Series Database with InfluxData
Cole Bowden’s DBTA webinar highlighted when organizations should replace generic data stacks with a purpose‑built time‑series database. He urged teams to start simple—if data fits in memory or a single drive, a traditional RDBMS may suffice, but scaling pressures demand...
Data Quality & Governance: Underrated Foundations Beyond Analytics
Data Quality and Data Governance are two of the most underrated but important areas in the data space There are other areas to explore in data outside of Analytics.
Temporary Tables in Databricks SQL | Do You Actually Need Them?
The article reviews temporary tables in Databricks SQL, explaining how they store intermediate results for the duration of a session and can be referenced across multiple statements. It compares them to Common Table Expressions, highlighting performance gains when avoiding repeated...
David Wheeler: Pg_clickhouse v0.1.4
The pg_clickhouse extension for PostgreSQL has been updated to version 0.1.4, a maintenance release that can be applied in‑place without running an ALTER EXTENSION command. The update resolves critical bugs: the binary driver now correctly inserts NULL into ClickHouse Nullable...

CVE-2026-25903 Impacts Apache NiFi Users
A new vulnerability, CVE‑2026‑25903, affects Apache NiFi versions 1.1.0 through 2.7.2 and was patched in 2.8.0. The flaw allows users with limited privileges to modify the configuration of already‑deployed restricted components, bypassing the platform’s authorization checks. While it does not...
Data Governance Without the Jargon: 30 Questions and Answers to Clarify Terms and Trends
Data governance has morphed into a catch‑all term covering quality, metadata, privacy, compliance, and digital strategy, creating ambiguity that blurs responsibilities and stalls decisions. A new resource, "What Is Data Governance? 30 Questions and Answers," builds on the Broadband Commission’s Data...

How to Build an Advanced, Interactive Exploratory Data Analysis Workflow Using PyGWalker and Feature-Engineered Data
The tutorial walks through building a fully interactive exploratory data analysis (EDA) workflow inside a Python notebook using PyGWalker. It starts with advanced feature engineering on the Titanic dataset, creating buckets, segments, and DuckDB‑safe columns for both row‑level and aggregated...
Design and Implementation of Cloud-Native Microservice Architectures for Scalable Insurance Analytics Platforms
A new study presents a cloud‑native microservice architecture designed for insurance analytics, leveraging Docker, Kubernetes, Kafka, and Spark to replace legacy monolithic systems. The design enables real‑time data ingestion, continuous AI model deployment, and automated scaling across services. Performance tests...

Microsoft Brings Two Data Halls Online in São Paulo, Brazil
Microsoft has brought two new data halls online in São Paulo, marking the first operational facilities under its $2.7 bn AI and cloud commitment to Brazil through 2027. The launch was announced at the Microsoft AI Tour by country president Priscyla Laham,...

VIDAA Takes on Nielsen, Comscore with V Index Measurement Solution
VIDAA, now rebranded as V, is set to launch its V Index measurement platform in Q2, blending traditional linear TV data with streaming‑app viewership. The solution will initially roll out across Europe, the Middle East and Africa, leveraging the company’s...
Pivot Tables: Business Data’s Everlasting REPL
Hot take: Pivot tables are the REPL for business data. Just like programmers use REPLs to quickly test code, business users use pivot tables to quickly test hypotheses about their data. Drag a field. See a result. Adjust. Repeat. This feedback loop is...

Quest Trusted Data Management Platform Makes It Easier for Organizations to Create Reusable Data Products
Quest Software unveiled the Trusted Data Management Platform, a unified suite that combines data modeling, cataloging, governance, quality, and a marketplace to deliver AI‑ready data across enterprises. The solution promises up to 54% faster data‑product delivery, up to 40% cost...

Dynatrace Announces Accelerated Growth and Innovation with AWS
Dynatrace announced accelerated growth by surpassing $1 billion in AWS Marketplace sales and earning the AWS Financial Services Competency. The company reported sustained triple‑digit growth over three years, driven by cloud‑native procurement demand. It also secured the AWS Agentic AI Specialization,...
SurrealDB 3.0 Wants to Replace Your Five-Database RAG Stack with One
SurrealDB launched version 3.0 alongside a $23 million Series A extension, bringing total funding to $44 million. The new release consolidates relational, vector and graph capabilities into a single Rust‑native engine, letting AI agents store memory, business logic and multimodal data transactionally. By...

Own Your Data, Not Just Model Tuning
AI teams love tuning models. But they ignore the bike chain: data. Outsourcing labeling to people that care much less on the app’s success. Messy internal docs. No structured knowledge base. No call transcripts. No clean SOPs. Then they ask: “Why isn’t the model improving?” The highest ROI in...

Hotel BI vs Excel: The Hidden Costs
Excel remains a default tool in hotels, but its apparent zero‑cost facade hides substantial operational expenses. Hotels can spend up to 125 hours each month cleaning, formatting, and moving data, turning revenue managers into data clerks. This manual burden erodes...
Master Six Core Concepts to Decode Regression Results
Most analysts can run a regression. Very few can explain what the output actually means. That gap is a statistics fundamentals problem. Not a tools problem. Not a Python problem. Not a years-of-experience problem. If you can't explain what your numbers mean, you...
LSEG Partners with Bank of America to Unlock Next Generation Data, Analytics and AI Ready Content for Clients
London Stock Exchange Group (LSEG) and Bank of America have entered a multi‑year partnership to embed LSEG’s data, analytics and workflow solutions across BofA’s platforms. The deal gives BofA clients governed, rights‑cleared data and AI‑ready content, powered by LSEG Workspace,...

The Myth of ‘Always On’: Confronting Data Center SPOFs
Recent incidents—a 2021 Texas freeze and a 2021 OVH fire in Strasbourg—highlight hidden single points of failure in data centers. The freeze delayed fuel deliveries, exposing over‑reliance on on‑site fuel, while the fire demonstrated how passive cooling designs can unintentionally...

Pure Storage Wants to Demolish Your Storage Silos
Pure Storage’s Cloud service extends its Purity operating environment to AWS and Azure, delivering a single‑pane‑of‑glass storage layer that feels identical to on‑prem FlashArray. By abstracting the underlying cloud hardware, the platform offers native APIs, replication, and security while adding...

Agoda Open Sources APIAgent to Convert Any REST Pr GraphQL API Into an MCP Server with Zero Code
Agoda has released APIAgent, an open‑source tool that turns any REST or GraphQL API into a Model Context Protocol (MCP) server with zero code and no deployments. The proxy reads OpenAPI or GraphQL schemas, generates tool definitions, and uses DuckDB...

Anthropic, Infosys to Build Custom AI Agents for Firms
Anthropic and Infosys have announced a partnership to create custom AI agents for enterprises, combining Anthropic's Claude large language models, including Claude Code, with Infosys' Topaz AI platform. The collaboration targets sectors such as telecommunications and financial services, aiming to automate...

China AI Startup Moonshot Targets $10 Billion Valuation
Moonshot, the Chinese AI startup behind the Kimi chatbot, is launching a new financing round that targets a $10 billion valuation. The company previously secured $500 million at a $4.3 billion valuation, and its existing backers—Alibaba, Tencent and 5Y Capital—have already committed more...

Data Is the New Oil, and Your Database Is the only Way to Extract It
In this episode, Ryan interviews Shireesh Thota, Corporate Vice President of Azure Databases at Microsoft, about the rapid evolution of Microsoft's database offerings, including SQL Server, Cosmos DB, and Postgres, and how they fit into a unified Azure data platform....

Interview: CyrusOne on the Sustainable Innovation that Drives Datacentre Business Outcomes
CyrusOne’s vice‑president of environmental, health, safety and sustainability, Kyle Myers, says the company treats sustainability as a profit centre rather than a cost centre. By centralising ESG functions into a cross‑functional working group, CyrusOne has integrated green‑building standards across its...

Dell VP Says Discrete Beats Disaggregated Storage for AI
Dell VP David Noy announced that discrete storage architectures outperform disaggregated designs for large‑scale AI workloads, reversing Dell’s earlier promotion of disaggregation. He cites PowerScale’s integrated controller‑drive chassis as delivering lower rack space, fewer switches, and reduced power consumption. The...
Micron's World-First PCIe Gen 6 SSD Doubles Data Rates for AI Data Centers
Micron has begun mass production of the 9650 NVMe SSD, the world’s first PCIe Gen6 storage device. The drive delivers up to 28,000 MB/s sequential reads and 14,000 MB/s writes, roughly doubling Gen5 bandwidth while staying within a 25 W power envelope. Capacities range...

French AI Cloud Startup Policloud Plans 1,000 Sovereign Micro-Data Center Deployments by 2030
Policloud, a French AI‑focused cloud startup, announced a plan to deploy up to 1,000 sovereign micro‑data centers by 2030, delivering more than 250,000 GPUs. The company has already installed eight sites across France, the Gulf Cooperation Council and the United...

All About Feature Stores
Feature stores have moved from niche tools to core infrastructure for operational machine‑learning, providing a single source of truth for features used in both training and online inference. The concept was coined by Uber in 2017 and commercialized by Tecton...

Four Data‑Backed Tech Fields to Pursue in 2026
I did some digging with the help of ChatGPT and Claude Here are 4 tech areas you can still explore in 2026 backed by data: • AI/ML – Data Analytics falls here • Cloud & Infrastructure • Security & Governance • Data Engineering Let me...
Vibhor Kumar: Pg_background: Make Postgres Do the Long Work (While Your Session Stays Light)
pg_background is a PostgreSQL extension that runs SQL statements asynchronously in dedicated background worker processes, letting client sessions stay lightweight. The new v2 API introduces a PID‑plus‑cookie handle that safeguards against PID reuse bugs, making long‑running jobs more reliable. Recent...

SK Hynix Proposes HBM and HBF Hybrid for LLM Inference
SK Hynix unveiled H³, a hybrid architecture that couples high‑bandwidth memory (HBM) with high‑bandwidth flash (HBF) on a single interposer to accelerate large‑language‑model (LLM) inference. HBF provides up to 16× the capacity of HBM while maintaining comparable bandwidth, serving as...

Cubbit Powers Swiss Cantonal-Level Sovereign Cloud for Ailanto
IT integrator Ailanto announced a sovereign cloud service for Swiss organizations built on Cubbit’s DS3 Composer software‑defined object storage. The offering launches with 1 PB of capacity hosted in Swiss‑based data centres and will expand later in 2026. It provides S3‑compatible,...

Decode Common SQL Errors and Their Real Fixes
Common SQL errors and what they REALLY mean "Column ambiguously defined" You joined tables with the same column name. Fix: Add table aliases (customers.id not just id) "Not a single-group group function" You mixed aggregated + non-aggregated columns. Fix: Add all non-aggregated columns to GROUP BY "Division...
The Data Checkup: A Framework for Assessing the Health of Federal Datasets
The Data Checkup framework, launched by dataindex.us, offers a systematic way to evaluate the health of federal datasets across six risk dimensions. It moves beyond simple URL monitoring to assess historical and future availability, quality, statutory context, staffing, funding, and...
Start with Excel, SQL, Power BI for Analytics
Aspiring Data Analyst? Don’t overcomplicate it. Start building projects with tools like Excel, SQL, and Power BI.
TomTom Partners to Support Data-Driven Mobility and Infrastructure Planning
TomTom has entered a strategic partnership with engineering firm AECOM to embed TomTom’s high‑resolution mobility data into AECOM’s global planning workflows. The collaboration gives AECOM access to precise traffic, safety and congestion insights, enabling more accurate, data‑driven decisions for road...

Breaking the Silos: The Rise of the Open Lakehouse Architecture in 2026
In 2026 the open lakehouse has become the de‑facto enterprise data strategy, merging low‑cost data‑lake storage with warehouse‑grade ACID transactions via open standards. By adding a metadata and transactional layer atop object storage, organizations achieve a single source of truth...

New GraphRAG Solution Moves Beyond Vector-Only RAG – Knowledge Graphs Provide Context and Common Sense to AI
Graphwise unveiled GraphRAG, a low‑code AI workflow engine that replaces flat vector‑only retrieval with a knowledge‑graph‑backed semantic layer. The platform claims to cut hallucinations and boost answer accuracy from about 60 % to over 90 % by grounding large language models in...

Tract’s Fleet Data Centers Seeks $3.8bn to Fuel Nevada Build-Out
Fleet Data Centers, the development arm of Tract, announced a $3.8 billion senior secured note issuance to fund a 230 MW data center campus in Reno, Nevada. The facility, built on a 252‑acre site, is 100 percent leased to an unnamed investment‑grade tenant...

From Tea to Tech: UK’s Halma Rides Wave of Hyperscaler Spending
Halma Plc, a FTSE 100 firm with roots in 19th‑century tea plantations, is benefiting from a surge in AI‑driven data‑center spending by hyperscalers. The company’s shares have climbed for eight consecutive sessions, reaching an all‑time high. Analysts at Barclays and JPMorgan...

Fossefall Targets Data Center in Harpefoss, Norway
Norwegian data‑center specialists Fossefall and Polar DC are unveiling new projects aimed at scaling AI‑focused, renewable‑powered facilities. Fossefall plans a data centre on the former Harpefoss childcare site, part of its ambition to reach 500 MW of clean AI infrastructure by 2030,...

Blackstone-Owned Link Logistics Files to Develop Data Center Campus Outside Atlanta, Georgia
Blackstone‑owned Link Logistics, through its affiliate B9 Union City Owner LLC, has filed a Developments of Regional Impact application to build a new data‑center campus called the Crossings outside Union City, Georgia. The 231‑acre site could host up to five...

South Korean Cloud Firm Okestro Could Build 5MW Data Center
South Korean cloud provider Okestro has signed a Memorandum of Understanding with data‑center developer DC Korea to construct a 5 MW facility at Okestro’s headquarters in Yeouido, Seoul. DC Korea will handle design and construction while Okestro will supply cloud‑orchestration services...
Integrate Data Quality Assertions Directly Into Orchestration
I see data contracts and data quality as overlapping but different: Data contracts: what is the data and how do we enforce it Data products: why do we need this data In practice, I'd argue for asset-based data quality assertions. Every time a...
Data Guides Positioning, Yet Quality Creativity Wins
Working in entertainment analytics I am often asked how best to position a title for success. But data can help you aim more accurately and efficiently. What it can’t do is provide the single most important element to success: a...

Telkom Indonesia Revisits NeutraDC Sale Plans - Report
Telkom Indonesia is re‑engaging advisors to sell a majority stake in its data‑center subsidiary NeutraDC, targeting a valuation between $1 billion and $1.5 billion. The company previously explored a sale in 2022 and considered minority stakes in 2024, with Goldman Sachs and...