Today's Big Data Pulse

Leadership Gaps Hamper Data Engineering Teams, Survey Finds
Three 2026 surveys of 1,629 data professionals reveal organizational issues now dominate data‑engineering bottlenecks. In January, weak leadership direction and poor requirements accounted for 40% of top‑bottleneck votes, while by April 50% cited lack of clear ownership as the biggest pain point. Legacy systems and tooling were far lower priorities, at 25% and under 5% respectively.
Also developing:
By the numbers: Sensor Tower acquires AppMagic to expand SMB offering
Informatica Adds Four Governance Features to Snowflake, Boosting Trusted AI Data
Informatica announced four new data management and governance capabilities for Snowflake, covering headless AI integration, row-level access policies, and Iceberg table scanning. The rollout aims to give enterprises a trusted, governed data foundation for AI agents and analytics workloads.
Maharashtra Unveils $200 Billion Data‑Centre Investment Pipeline
Maharashtra's state government disclosed a Rs 16.69 lakh crore ($200 bn) data‑centre investment pipeline covering 44 mega‑projects. The plan promises 23,800 MW of IT capacity, 146,000 direct jobs and positions the state as India's leading data‑hub.
Western Digital's Five‑Year AI Roadmap Puts Storage Ahead of Compute
Western Digital unveiled a five‑year AI infrastructure roadmap that shifts focus from expanding compute clusters to building ultra‑high‑capacity storage systems. The plan highlights 40TB UltraSMR ePMR drives now in qualification and a push toward 100TB+ HAMR HDDs, backed by long‑term...
AWS and SAP Launch Five Tools at SAPPHIRE 2026 to Cut Migration Time to Days
AWS and SAP announced five new capabilities at the SAPPHIRE 2026 conference, promising to reduce SAP cloud migration cycles from weeks to days for the 440,000+ companies that run SAP worldwide. The tools combine native AWS orchestration, private connectivity, generative...
Google Adds Continuous SQL Queries and Cross‑region Spanner Access to BigQuery
Google has rolled out continuous SQL queries and cross‑region federated access to Cloud Spanner in BigQuery, enabling always‑on analytics pipelines without extra egress fees. The upgrades turn the data warehouse into a reactive platform for real‑time decision making.
Databricks Launches Genie Code AI Agent and Deepens SAP Integration
Databricks announced the deployment of Genie Code, an autonomous AI agent embedded in its Lakeflow environment, and a tighter integration with SAP Business Data Cloud via Unity Catalog. The moves aim to automate data‑engineering workflows, improve governance, and expand the...
Google Cloud Enables Cross‑Engine Apache Iceberg Support in BigQuery
Google Cloud announced a preview that adds cross‑engine Apache Iceberg support to BigQuery, allowing the same Iceberg tables to be accessed from Spark, Flink, Trino and BigQuery without data duplication. The serverless Iceberg REST catalog aims to streamline lakehouse workflows...
Denodo Adds AWS Integrations to Power Governed Agentic AI in the Middle East
Denodo announced a suite of integrations with Amazon Web Services that embed its data‑virtualization platform into Bedrock AgentCore, SageMaker and Quick. The move aims to give AI agents trusted, real‑time access to governed data across hybrid and multi‑cloud environments, a...

Databricks Zerobus Streaming Ingestion for Delta Lake House
Databricks introduced Zerobus, a high‑throughput streaming service that writes data directly into Delta Lake tables, removing the need for external message buses like Kafka. The Python SDK (and others for Rust, Go, TypeScript, Java) lets developers stream Apache Arrow RecordBatches...
Span and Nvidia Launch XFRA Mini‑Data‑Center Units to Turn Homes Into Edge AI Nodes
Smart‑panel startup Span, together with Nvidia, introduced XFRA—a compact AI compute unit the size of an air‑conditioner that can be installed in a single‑family home. Each node packs 16 GPUs, draws 12.5 kW, and 8,000 such nodes would equal the power...
Study Finds TikTok's Recommendation Engine Favours Anti‑Democratic Content in 2024 Election
Researchers at NYU Abu Dhabi’s AI and Society Lab used 323 bot accounts to audit TikTok’s For You page and found the platform disproportionately recommended anti‑Democratic videos during the 2024 election. The study, published in Nature, highlights algorithmic bias that...
WisdomAI Deploys Autonomous Analytics Agents to Streamline Enterprise Data Workflows
WisdomAI unveiled its Analytics Agents platform, enabling enterprises to design, test and deploy AI‑driven agents that autonomously explore, clean and act on data. The solution plugs into over 200 native integrations and is already being used by fintech firm Trumid...
Nvidia's Earnings Reveal AI Spending Shifts Beyond GPUs to Data Infrastructure
Nvidia reported record first‑quarter fiscal 2027 revenue of $81.6 billion, with data‑center sales climbing 92% to $75.2 billion. The company also announced a new reporting split that isolates a fast‑growing “ACIE” segment and highlighted networking revenue jumping 199% to $14.8 billion, underscoring a...
Semantic Layer Summit 2026 Positions Business Context as Core Infrastructure for Enterprise AI
More than 6,000 data leaders gathered at the Semantic Layer Summit 2026, where industry giants like Snowflake, Databricks and Anthropic affirmed the semantic layer as the foundation for accurate, governed enterprise AI. The summit underscored the need for shared business...
Michigan's Education Database Moves From Vision to Infrastructure
Michigan is advancing its MiGreatDataLake project by focusing on secure, interoperable infrastructure before deploying AI-driven analytics. The proof‑of‑concept phase, launched in January 2025, validates a medallion‑style data pipeline that moves raw student records through bronze, silver, gold, and platinum layers....
Don't Go Dark: Visibility Is a Data Engineering Skill
Data engineers often work in silence, producing valuable but invisible changes that can go unnoticed for weeks. The article revisits Jeff Atwood’s “don’t go dark” rule—three weeks without a visible deliverable signals a risk of hidden problems. It explains why...
Legacy Data Stacks Falter as AI Demands Real‑Time, Distributed Access
Enterprise data architectures built for batch queries and static pipelines are being outpaced by AI workloads that require instant, multi‑source data access. Analysts warn that firms that cling to legacy warehouses risk falling behind as AI‑driven operations migrate to lakehouse...

Delivering Successfully Governed Self-Service Analytics with Informatica and TrustLogix
A DBTA webinar featuring Informatica’s Vaibhav Suresh and TrustLogix’s Simon Thornell highlighted the growing chaos in self‑service analytics and presented a governance framework to tame it. They cited that 70% of data leaders believe their most valuable insights sit in...
D&B's Database of 642 Million Businesses Was Built for Humans, Not AI Agents. So They Rebuilt It.
Dun & Bradstreet rebuilt its Commercial Graph, a 642‑million‑business database, to serve AI agents rather than human analysts. The legacy system’s fragmented architecture and static relationships could not meet the sub‑second latency and dynamic data needs of machine‑driven credit, procurement,...
IBM and U.S. Commerce Dept. Launch $1 B Quantum Foundry, Part of $2 B CHIPS Initiative
IBM and the U.S. Department of Commerce announced a $1 billion grant to create America’s first purpose‑built quantum foundry for superconducting wafers. The award is part of a $2.013 billion CHIPS and Science Act package that also funds GlobalFoundries and other firms,...

Confluent Current London 2026 - Confidence in the New Streaming Age
At Confluent Current London 2026, CPO Shaun Clowes warned that legacy data pipelines are hindering the rise of agentic AI. He highlighted Confluent’s Tableflow, which streams data in real time to lakes and warehouses, eliminating batch ETL and lineage gaps. With Kafka...

Pinewood.AI Expands Dealer BI Platform with New Modules
Pinewood.AI has added Accounting & Finance and Customer modules to its dealer Business Intelligence platform, completing the Core Insights tier. The new modules deliver live operational dashboards, automated reporting and unified metrics, aiming to replace manual spreadsheet processes. Pinewood claims dealers can...
DataOps Market Projected to Hit $32.7 B by 2035, Forecast Shows 21% CAGR
MarketGenics estimates the global DataOps market was worth $4.7 billion in 2025 and will expand to $32.7 billion by 2035, a compound annual growth rate of 21.4%. The surge reflects rising enterprise demand for agile data pipelines, AI‑powered analytics and cloud‑native automation.
Panasonic Boosts Enterprise BI Speed with Databricks Lakeflow
Panasonic's central data infrastructure team has migrated legacy ETL pipelines to Databricks Lakeflow, slashing data ingestion windows from five‑six hours to minutes and reducing pipeline failures that previously occurred about ten times a year. The move, detailed in a Databricks...
Snowflake Secures $58B SaaS Deal with GSA, Offering Up to 50% Off Cloud AI Services
Snowflake announced a multi-year agreement with the U.S. General Services Administration’s OneGov framework, delivering AI‑enabled data‑cloud services to all federal agencies at 20%‑50% discounted compute rates and a 27% storage cut. The deal, running through September 2027, marks a major...
Big Data Analytics in U.S. Finance: From Frontier to Settled Discipline
Big data analytics in U.S. finance has moved from a frontier technology to a settled discipline, with cloud warehouses, lakehouses and streaming pipelines now commoditized. Proven use cases—Customer‑360, risk, fraud and regulatory analytics—consistently generate ROI, while speculative projects often bleed...
WisdomAI Launches Autonomous Analytics Agents to Automate Enterprise Workflows
WisdomAI unveiled its Analytics Agents, AI‑driven tools that not only generate insights but also execute approved actions across enterprise data stacks. The agents, built on the company’s Federated Agentic Intelligence platform, aim to automate routine business processes while preserving auditability...
Architecting Petabyte-Scale Hyperspectral Pipelines on AWS
The article outlines a petabyte‑scale hyperspectral data pipeline on AWS that moves raw sensor cubes from remote fields to queryable tables using an S3‑SQS‑Lambda‑Batch ingestion flow, aggressive S3 lifecycle tiering, and an Apache Iceberg medallion lakehouse. Edge containers on NVIDIA...
Confluent Launches Real‑Time AI Suite to Secure Streaming Data at Scale
Confluent, the data‑streaming pioneer now owned by IBM, rolled out new features in Confluent Intelligence and Confluent Cloud that unify the AI lifecycle, automate privacy controls and enable private connectivity to external models. The upgrades target the security and complexity...

Teradata Factory Offers an On-Prem Foundation for the Agentic Enterprise
Teradata announced the Teradata Factory, an on‑premises extension of its Autonomous Knowledge Platform built on Dell Technologies hardware. The solution unifies the full Teradata software stack—including AI Studio—under a single management plane and supports enterprise data warehousing, lakehouse, and GPU‑accelerated...
Dell Tech World 2026: Mazda Builds AI-Ready Data Foundation with Dell
Mazda Motor Corp. has deployed Dell PowerScale to unify its design, development, and CAD data into a single, scalable storage platform. The new infrastructure expands capacity from roughly 4 PB to 10 PB while cutting storage cost per unit by 90 percent....
When "Garbage In, Garbage Out" Gets It Wrong
In this episode, Terence Lee St. John, founder of Enly and lead author of the paper "From Garbage to Gold: A Data Architectural Theory of Predictive Robustness," explains why machine‑learning models can achieve state‑of‑the‑art performance even when trained on noisy,...
Databricks Unveils Real-Time Fraud Accelerator, Spark RTM Cuts Latency 92%
Databricks introduced a new Solution Accelerator that pairs Spark Real‑Time Mode with its Lakebase service to detect credit‑card fraud in under 300 ms, claiming up to 92% speed gains over Apache Flink and highlighting $33 billion annual losses from card fraud.
AI Success Depends on These Data Governance Metrics
Enterprises are realizing that traditional data‑governance dashboards, which focus on documentation and ownership, fall short for AI workloads. New metrics—such as lineage completeness, certified dataset usage, and pipeline observability—measure data trust at runtime, ensuring AI systems draw from reliable, up‑to‑date...
Informatica Launches Headless Data Services and Unified Agent Governance
Informatica, now a Salesforce subsidiary, introduced a headless version of its Intelligent Data Management Cloud and a unified Agent and Context Catalog. The move targets enterprise AI agents that need governed, context‑rich data without traditional UI constraints, addressing a survey‑found...
Washington Tightens AI Rules, Targeting Deepfakes and Data Governance
Washington announced a sweeping AI regulatory push, licensing Anthropic’s Mythos model and launching enforcement of the Take It Down Act to force platforms to delete non‑consensual deepfakes within 48 hours. The moves signal a new era of data‑centric oversight for...

Reviewing Azure OneLake: Unified Data Lake Architecture for Modern Solutions
Azure OneLake launches as a unified data lake platform that consolidates structured, semi‑structured, and unstructured data into a single logical repository. It natively blends lakehouse capabilities with Azure services such as Synapse, Fabric, and Power BI, delivering real‑time ingestion, robust governance...
DeepMind's Co‑Scientist AI Tackles Cancer Drug Discovery with Big‑data Analytics
Google DeepMind unveiled Co‑Scientist, a multi‑agent AI platform designed to accelerate biomedical research. In a pilot for acute myeloid leukemia, the system shortlisted 30 drug candidates, three of which showed promising activity in lab tests. The launch signals a new...

Re-Air: The Rise of the Citizen Developer: Solving Business Problems with Alteryx and AI with Andy Macmillan
In this re‑aired episode, Alteryx CEO Andy Macmillan discusses the evolution of the citizen developer—business users with enough technical skill to build data solutions—and how AI is reshaping that role. He explains Alteryx’s mission to democratize data preparation and analytics,...
Dell and Palantir Unveil On-Prem AI Operating System to Accelerate Enterprise Data Integration
Dell Technologies and Palantir announced a joint on‑premises AI operating system at Dell Technologies World. The solution combines Dell’s AI Factory hardware with Palantir’s Foundry and Ontology platforms to create a unified, governed semantic layer for sensitive enterprise data. The...
Denodo Launches AWS Integrations to Power Trusted Data Foundations for Agentic AI
Denodo announced native integrations with Amazon Web Services—SageMaker, Bedrock AgentCore and Quick—and placed its platform on the AWS Marketplace. The enhancements deliver zero‑copy, real‑time data with a unified semantic layer, aiming to accelerate agentic AI deployments in finance, healthcare, manufacturing...
Databricks Launches Analytics Engineer Learning Pathway to Upskill SQL Practitioners
Databricks announced the Analytics Engineer Learning Pathway, a curriculum that trains SQL practitioners to build governed, AI‑ready data models, pipelines and metric views. The program, available now on Databricks Academy, aims to fill a talent gap as organizations lean on...
Snowflake Adds Dataiku Bedrock Valid Systems to AI Cloud, Launches Risk Tools
Snowflake announced new AI Data Cloud integrations with Dataiku, Bedrock and Valid Systems, plus a risk‑tool offering that lets smaller banks run sophisticated fraud decisioning on Snowflake’s platform. The moves aim to lock AI workloads inside Snowflake and broaden access...
Snowflake Secures IRAP PROTECTED Status on Google Cloud, Expanding Australian Govt Access
Snowflake announced it has passed the Australian Signals Directorate's IRAP PROTECTED assessment for its Google Cloud Melbourne region, joining AWS and Azure in offering government‑grade security. The milestone gives federal agencies confidence to run sensitive analytics and AI workloads on...

How Insurer Aviva Migrated 1.3PB of Siloed Data to Become "AI-Ready" In 7 Months
Aviva completed a lift‑and‑shift migration of 1.3 petabytes of siloed data from Oracle Cloud to Snowflake in just seven months, creating a unified data platform. The new architecture underpins its AI initiatives, allowing the insurer to launch AI‑driven services such as...
Enterprise AI Stalls as Hidden “Pipeline Tax” Inflates Data‑movement Costs
EnterpriseDB’s CTO Quais Taraki warns that a hidden “pipeline tax” – the cumulative cost of moving data through multiple translation layers – is delaying AI projects by up to six months. The tax, invisible on balance sheets, is cited as...
SAP's $1.1B Data Push: Reltio Deal, Dremio Pending, Prior Labs AI Lab
SAP closed its $1.1 bn acquisition of master‑data specialist Reltio on May 7, announced a pending buyout of data‑lake platform Dremio, and committed more than €1 bn ($1.08 bn) to German AI startup Prior Labs. The three‑tier plan aims to unify data preparation, connectivity...
The AI Data Governance Gap that Keeps Getting Worse
Enterprises are rapidly embedding AI into products, but most overlook data governance. Production databases are routinely copied into dev environments, data lakes, and third‑party services without clear oversight, leaving real customer records exposed. The article cites a mid‑size lender where...
Amagi Cuts Costs 45% with Unified Data Lake on Databricks
Amagi, the global media‑technology provider, announced a 45% reduction in operating costs and faster product rollout after consolidating its fragmented data environment onto Databricks' lakehouse platform. The move resolves cross‑region governance challenges and creates a single source of truth for...
Snowflake Adds Dataiku, Bedrock, Valid Systems to AI Data Cloud, Backs OSI Push
Snowflake announced a suite of new native integrations—Dataiku Cobuild, Bedrock Data’s free governance tier and Valid Systems’ fraud‑decisioning tools—into its AI Data Cloud, while also championing the Open Semantic Interchange (OSI) standard to keep AI workloads on‑platform. The moves aim...