Today's Big Data Pulse

Leadership Gaps Hamper Data Engineering Teams, Survey Finds
Three 2026 surveys of 1,629 data professionals reveal organizational issues now dominate data‑engineering bottlenecks. In January, weak leadership direction and poor requirements accounted for 40% of top‑bottleneck votes, while by April 50% cited lack of clear ownership as the biggest pain point. Legacy systems and tooling were far lower priorities, at 25% and under 5% respectively.
Also developing:
By the numbers: Sensor Tower acquires AppMagic to expand SMB offering
Can a New AI-Powered Platform Help Police Close Cases?
Guillaume Delépine founded San Francisco‑based Longeye to use AI for sorting massive digital evidence, aiming to boost police case‑closure rates. The platform, now negotiating 20 contracts, ingests data such as phone records, emails and GPS to deliver searchable case summaries, and early pilots have uncovered key homicide and robbery clues. Critics caution that AI‑driven evidence searches could erode privacy and produce false leads, as illustrated by a mis‑flagged biblical passage. Longeye has raised $5 million, with pricing around $5,000 per year for a 20‑officer agency.
Databricks Pledges $850 M to Expand London HQ and Boost AI Data Ecosystem
Databricks announced an $850 million investment over three years to quadruple its London headquarters, grow its UK‑Ireland workforce to over 1,000 and train 100,000 data‑AI professionals. The move positions the company as a central hub for its Lakebase and Genie AI...
Delta Change Data Feed Deep Dive: Building Incremental Pipelines Without Complexity
Delta Lake’s Change Data Feed (CDF) lets engineers capture row‑level changes as soon as they occur, turning a Delta table into a built‑in change‑data‑capture engine. By enabling the table property delta.enableChangeDataFeed, only modified rows are read, eliminating costly full‑table scans for...

AI Will Replace Tableau and PowerBI with Instant Dashboards
AI is about to kill Tableau and PowerBI. Every dashboard can now be created in seconds with these Free Agents:

Redpanda Unveils Adaptable Streaming Engine to Eliminate ?Streaming Sprawl?
Redpanda announced the general availability of its adaptable R1 streaming engine in Redpanda Streaming 26.1, a single‑modal platform that lets enterprises tailor performance, safety and cost at the topic level. The release integrates Cloud Topics, write caching, tiered storage, and Iceberg...

Window Functions Rank without Collapsing Rows
SQL tip GROUP BY collapses your rows. Sometimes you need the ranking without losing the detail. That's what window functions do. PARTITION BY region restarts the ranking for each region. ORDER BY total_spend DESC puts the highest spender at rank 1. Every row stays intact....

Generative BI Amplifies Your Foundation: Scale Intelligence, Not Chaos
Generative BI doesn’t accelerate everything. It compresses friction. But here’s the truth: It amplifies whatever it sits on top of. Strong foundation → intelligence at scale Weak foundation → chaos at scale That’s the inflection point. Article 2/4: https://t.co/E4BYVwebSq https://t.co/n2mvvezbl7

The Missing Layer in Europe’s AI Strategy: Data Ownership
European policymakers are pushing digital sovereignty, yet the missing piece is data ownership. As AI models become commoditized, control over the underlying data is emerging as the true competitive moat. Countly, an open‑source, self‑hosted analytics platform, illustrates how firms can...
CIOs Should Explore Domo’s No‑Code AI Workflow Suite
Domo is best known for its BI capabilities, but CIOs should take a fresh look at its no-code workflows, robust data integration capabilities, and emerging AI Catalyst for developing AI agents #cio #ai https://t.co/Rgkexj9XJP
StreamNative Turns Kafka Company with Lakehouse Foundation
It's April 1st and I have an announcement: @streamnativeio is a @apachekafka company now. Yes, the Pulsar people. We took Apache Kafka 4.2 and gave it a lakehouse foundation. Topics = Iceberg tables. 10x cheaper. Zero code changes. https://t.co/Waj3eO8BoZ
Oracle Deploys AI Data Platform to Federal Agencies, Unifying Data for Mission‑Critical AI
Oracle announced the Oracle AI Data Platform for U.S. federal agencies, a secure foundation that links generative AI models with agency data, applications, and workflows. The platform combines Oracle Cloud Infrastructure, Autonomous AI Database and Enterprise AI services to break...
Corewell Health’s Jarve Says Population Health Data Challenges Demand Internal Builds
In this episode, Dr. Bob Jarvie, Associate CMIO and Medical Director for Population Health Analytics at Corewell Health, explains why the health system built its own internal population health data platform instead of relying on external vendors. He highlights the...
New Hampshire’s Secret Role in Northeast Public Health Data Consortium Raises Transparency Concerns
New Hampshire officials have been quietly involved in the Northeast Public Health Collaborative since its inception, even though the state was omitted from the public announcement. Internal emails reveal the state’s continued participation in leadership calls and data‑sharing initiatives, highlighting...
CFOs Unlock Expansion via Clean Data and AI
I really suck at software demos, but hopefully that didn't diminish my new Revenue Intelligence feature I demo'd today. I may be able to methodically explain a P&L, but I realized today that I need training on software demos. Can...

Wiliot Builds Its Physical AI Supply Chain Platform on Databricks to Operationalize Item-Level IoT Data
Wiliot announced a Built‑On partnership with Databricks, moving its Physical AI supply‑chain platform onto the Databricks lakehouse. The shift lets the company ingest and govern massive item‑level IoT Pixel data streams in a unified environment. By leveraging Databricks’ compute and...

Nomadic Raises $8.4 Million to Wrangle the Data Pouring Off Autonomous Vehicles
NomadicML raised an $8.4 million seed round at a $50 million post‑money valuation to commercialize its vision‑language platform that auto‑annotates autonomous‑vehicle video. The tool transforms terabytes of archived footage into searchable, structured datasets, enabling rapid identification of rare edge‑case events for training...

EDB Postgres AI for WarehousePG: Reclaiming Control of the Enterprise Data Warehouse
Enterprise data warehouses are increasingly seen as costly, inflexible assets, prompting a shift toward open‑source alternatives. EDB Postgres AI introduces WarehousePG, a PostgreSQL‑based, petabyte‑scale MPP warehouse that promises up to 58% lower total cost of ownership while delivering predictable performance. The...
IRS Pilots Palantir AI to Pinpoint High‑value Audits
The IRS is testing Palantir's AI-powered analytics platform to identify "highest-value" audit and investigation targets, documents obtained by Wired reveal. The pilot program aims to cut through decades of fragmented legacy systems to surface taxpayers most likely to be committing...

The Profisee 2026 R1 Release Brings Trusted Master Data to Any AI Tool
Profisee has launched the general‑availability of its 2026 R1 release, positioning the cloud‑native MDM platform as an AI leader. The update introduces a Model Context Protocol (MCP) Server that creates an open standard for linking master data to AI tools such...

SAP’s Reltio Acquisition Forces A Choice For CIOs
SAP announced acquisition of Reltio, integrating its master data management platform into SAP Business Data Cloud. The move gives SAP control over the enterprise master data layer, especially for customers with mixed SAP and non‑SAP environments. By embedding Reltio, SAP...

CIBO’s Data and Analytics Platform to Advance Ingredion’s Responsible Sourcing Initiatives
CIBO Technologies has entered a three‑year strategic partnership with Ingredion to expand regenerative agriculture across its supply chain. The collaboration will use CIBO’s data and analytics platform, including AI and computer‑vision tools, to enroll and support farmers in Iowa and...
Palantir Extends Five‑Year Deal with Stellantis, Adding AI Platform to Automotive SaaS Stack
Palantir Technologies announced a five‑year renewal and expansion of its partnership with Stellantis, adding the Palantir Artificial Intelligence Platform to the automaker’s existing Foundry deployment. The deal deepens Palantir’s foothold in automotive SaaS and comes as the company’s stock slipped...

Combine Multiple Aggregates in One Query Using CASE
SQL tip You're running three separate queries to get this. SELECT SUM(amount) FROM orders WHERE user_type = 'premium'; SELECT COUNT(*) FROM orders WHERE is_first_order = TRUE; SELECT SUM(amount) FROM orders; You can get all three in one. This pattern works across Oracle, SQL Server, PostgreSQL, BigQuery...

AI Era Challenges Traditional Data Owner Role
Reserve your spot at Friday's Coffee with Digital Trailblazer. Our topic this week: Redefining Data Governance: Is the Data Owner Role Obsolete in the AI Era? https://t.co/i7NcU4uICI #AI https://t.co/9hxa4qwsuS

RSAC 2026: Commvault Extends Enterprise Resilience to Structured and AI Data with Real-Time Governance Controls
Commvault announced an expansion of its data security posture management (DSPM) to include structured data and AI‑driven vector databases, leveraging its recent acquisition of Satori. The new real‑time data access governance lets security teams monitor and control structured data usage,...

Old‑School ML Turns Messy Support Chats Into Actionable Insights
Great detailed write-up by Mariia explaining how we built topic modelling that turns surprisingly messy support chats into structured & applicable actionable insights A fun reminder that "traditional" ML (e.g. >4 years old) is still very useful https://t.co/vo9KfPIg6q https://t.co/KFMGcHYHxN

Top Data Preparation Challenges and How to Overcome Them
The article lists seven common data‑preparation challenges—poor profiling, missing or invalid values, name/address inconsistencies, cross‑system data mismatches, enrichment hurdles, and scaling issues—and offers practical ways to address each. It highlights that data preparation typically consumes the majority of effort in...

Elizabeth Garrett Christensen: Postgres Vacuum Explained: Autovacuum, Bloat and Tuning
PostgreSQL relies on periodic vacuuming to reclaim space from dead tuples created by its MVCC architecture and to prevent transaction ID wraparound. The built‑in autovacuum daemon, enabled by default, triggers when dead rows exceed a threshold of 50 rows plus...
CorridorIQ Partners with Bonaventure to Deploy AI Across $2.8 B Multifamily Portfolio
CorridorIQ and Bonaventure announced a strategic partnership that will embed AI-driven migration intelligence across Bonaventure's $2.8 billion multifamily asset base. Co‑founders Zave Greene and Luke Anderson will serve as AI Entrepreneurs in Residence, with the firms expecting tens of millions of...
Data Stewards Must Be Domain Experts, Not Engineers
Data stewards don't need to be engineers. They need to be domain experts who can speak to data quality. #DataGovernance #DataSteward https://t.co/aQ7n0Kcc79

Palantir’s UK Boss Criticises ‘Ideological’ Groups as Ministers Move to Scrap NHS Contract
Palantir’s UK executive warned ministers against yielding to “ideologically motivated” campaigners as they consider invoking a break clause in the NHS’s £330 million (≈ $413 million) Federated Data Platform contract. The AI‑enabled platform is projected to generate £150 million (≈ $188 million) in benefits by 2030,...

‘Fragmentation Is Poison’: How Microsoft Is Targeting Disparate Data to Boost AI Adoption
Microsoft unveiled Database Hub and Fabric IQ at FabCon and SQLCon 2026, extending its Fabric SaaS analytics platform to unify roughly 20 data services under a single management plane. The new Database Hub adds AI‑driven, natural‑language exploration across Azure SQL, Cosmos DB, PostgreSQL and...
Government Datasets Are Poorly Labelled and Will Fail AI
The Open Data Institute’s four‑month NDL‑Lite prototype scanned more than 100,000 public datasets from six UK sources, exposing pervasive labeling gaps, outdated records, and accessibility hurdles. Notably, a major Home Office crime dataset has not been refreshed since 2018, while...
Google's 200M-Parameter Time-Series Foundation Model with 16k Context
Google Research released TimesFM 2.5, a decoder‑only time‑series foundation model with 200 million parameters, down from 500 million in version 2.0. The new model supports a 16 k context window, far exceeding the prior 2 048 limit, and adds an optional 30 million‑parameter quantile head for continuous...
The Power BI Crash That Sparked a Data Revolution at Dodge Industrial
Dodge Industrial’s reliance on a 20‑year‑old SAP BW system forced business users to build a shadow data warehouse in Power BI, which ultimately crashed the company’s entire tenant. The outage prompted Data & Analytics Manager Daniel Garrett to partner with Protiviti...

HealthcareWATCH
Within3 unveiled Dataverse, a unified real‑world data ecosystem that merges electronic health records, claims, and specialty analytics to sharpen pharmaceutical launch decisions. Avalere Health released a global framework to broaden genomic profiling in cancer care, while Emota’s report highlighted rising...

Blake Foster Appointed Head of Business Intelligence at HYBE America
HYBE America announced Blake Foster as its new Head of Business Intelligence, a role designed to centralize data strategy across U.S. operations. The former Warner Music Group senior vice president will build the company’s analytics infrastructure and turn artist‑generated data...
China Deploys AI‑Driven Smart Farming Across 3,800 Mu, Boosting Yields and Income
China’s Ministry of Agriculture and Rural Affairs has accelerated the rollout of AI‑enabled smart farming on more than 3,800 mu of fields in Jiangsu, using drones, IoT sensors and data‑analytics platforms. The initiative has lifted tea leaf quality by nearly 50%...

Bridging Worlds with Hammerspace and the Reality of Multi-Cloud Mobility
Hammerspace unveiled a Unified Global Namespace that abstracts storage across on‑prem, AWS, Azure and OCI, letting data appear locally wherever compute runs. Its policy‑driven Objective‑Based Data Orchestration moves only the required blocks, eliminating heavyweight migrations for AI and GPU‑intensive workloads....
Scaling Kafka Consumers: Proxy Vs. Client Library for High-Throughput Architectures
Apache Kafka’s pull‑based model excels for event‑driven microservices, but scaling consumer groups creates operational overhead, head‑of‑line blocking, and complex error handling. Large enterprises such as Wix and Uber have addressed these limits by deploying a centralized push‑based consumer proxy, achieving...
How Lumen Is Dismantling Decades of Network Complexity
Lumen Technologies, a $12.4 billion telecom operator with a 500,000‑mile fiber network, faced fragmented inventory from decades of acquisitions, operating over 17 legacy systems and nearly 500 data sources. It built a unified data layer and AI‑driven digital twin, launching the...
Turn Spreadsheets Into Live AI Dashboards Instantly
Fastest way to build a business dashboard with AI in 2026: ↓ 1// Take any spreadsheet you already use to track business data. Revenue, leads, whatever you've been tracking manually. 2// Open Claude Code and tell it you want a live dashboard...
Orchestrating and Designing Data Collaboratives: What Governance Model Is Fit for Purpose?
Stefaan Verhulst’s paper surveys the surge of data‑governance models—data trusts, commons, cooperatives, intermediaries, unions, sandboxes and data spaces—and argues they are not competing solutions but purpose‑driven responses to distinct coordination challenges. He proposes a typology of seven governance archetypes, each...
A Developer’s Guide to Integrating Embedded Analytics
Embedding analytics directly into applications is rapidly becoming a strategic priority for software vendors, as 78 % of tech leaders plan to boost BI investments. Developers must decide between building custom visualizations or buying a third‑party platform such as Tableau, Power BI,...

Snowflake Intelligence for Retail: Scaling Enterprise AI
The Mark Anthony Group (MAG) has moved from a traditional data warehouse to Snowflake Intelligence, turning its data platform into a generative business intelligence engine. By mandating Snowflake Secure Data Sharing in vendor RFPs, MAG streamlined real‑time data integration and...
Lakebase Postgres Powers Agents with Human‑Speed Data Access
@JeffDean says it best, the problem in this new agentic era is "tools designed for human speed interaction". That's why we think agents love 𝗟𝗮𝗸𝗲𝗯𝗮𝘀𝗲 𝗣𝗼𝘀𝘁𝗴𝗿𝗲𝘀, it can branch, snapshot, scale up and down in a second, orders of magnitude...
Stateless Semantic Search in BigQuery Simplifies Small Datasets
You have any giant, convoluted code or SQL logic to handle data values that might be similar? @JeffONelson did. But he shows off a new stateless semantic search in @googlecloud BigQuery that might be a lifesaver for small datasets. https://t.co/RU5q8SoJb4

Agent Skills: Disseminating Expertise
dbt Labs unveiled a suite of eight AI agent skills that automate complex dbt tasks, including a migration from dbt Core 1.10 to Fusion that completed without human intervention. These skills distill hundreds of hours of community expertise into concise...
Palantir's Blockbuster Earnings Week Fueled by DoD, FCA & Golden Dome Wins
Palantir Technologies posted a surprise earnings beat and announced three marquee contracts—a Department of Defense program‑of‑record designation for its Maven system, a UK regulator pilot, and a software role in the $185 bn Golden Dome missile‑defense initiative—signaling accelerating AI‑driven revenue growth...
Alpha School’s $65K AI Curriculum Promises 2‑4× Faster Learning
Alpha School, a private network with campuses in Austin, Miami, San Francisco and New York, has launched a data‑centric curriculum where students spend two hours a day with AI tutors for $65,000 a year. The company says the model accelerates...