Today's Big Data Pulse

CRN’s 2026 Big Data 100 maps $31.8B market surge and AI‑driven vendor moves
The report projects the big‑data market to reach $31.8 billion in 2026, driven by rapid AI adoption and new LLM‑enabled analytics. It highlights Alteryx’s $4.4 billion private‑equity buyout, AtScale’s Snowflake‑led financing round, and Hex’s $70 million Series C funding.

Understanding the Layers of the AI‑ready Modern Data Stack
Enterprises are rapidly replacing legacy data architectures with an AI‑ready modern data stack as AI initiatives surge. Deloitte’s 2026 survey shows strategic AI readiness rose to 42%, but confidence in data‑management capabilities slipped to 40%, while an IDC study found 84% of firms still run outdated storage. The new stack is organized into seven layers—from ingestion through consumption—emphasizing hybrid‑cloud design, lakehouse unification, and zero‑trust governance. Vendors are now pressured to deliver semantic, policy‑as‑code, and multi‑cloud capabilities to support trustworthy AI.
AI Fails without Clean, Documented, Owned Data
Most companies experimenting with AI are not struggling with models. They’re struggling with: – messy internal data – inconsistent schemas – no documentation – no data ownership You can’t plug OpenAI into chaos and expect magic. Data hygiene is important for AI.
Planet Labs Posts 26% Revenue Rise and First Annual EBITDA Profit in Q4 2026
Planet Labs announced $307.7 million total revenue for 2026, a 26% year‑over‑year increase, and $86.8 million in Q4, up 41% YoY. The company posted its first full‑year adjusted EBITDA profit of $15.5 million and generated $52.9 million in free cash flow, driven by expanding...
From DLT to Lakeflow Declarative Pipelines: A Practical Migration Playbook
Databricks is rebranding Delta Live Tables as Lakeflow Spark Declarative Pipelines, adding open‑source Spark alignment and new features. Existing DLT pipelines run unchanged, but Databricks recommends updating imports, decorators, expectations, and CDC logic to the new `dp` API. The migration...

How to Build an Effective Big Data Strategy
Smart organizations leverage big data to boost performance, but without a clear strategy they risk duplicated projects, compliance breaches, and wasted spend. The article outlines a four‑step framework—defining business goals, assessing data readiness, prioritizing use cases, and creating a flexible...
IQIYI Repurchases $207.8 Million of Convertible Notes, Leaves $259 K Outstanding
iQIYI finished a $207.8 million repurchase of its 6.50% convertible senior notes due 2028, leaving only $259,000 of principal outstanding. The move reduces the company's debt load but comes as its shares trade near a 52‑week low and analysts flag a...
LightningChart Introduces No-Code Visualization Platform Dashtera
LightningChart unveiled Dashtera, a no‑code, web‑based analytics platform that leverages GPU‑accelerated rendering to display up to 100 million data points in real time. The solution removes the need for extensive implementation projects, data reduction, or custom integration, delivering instant zoom and...

Informatica Adds Microsoft Fabric Support and Opens Swiss Data Center
Informatica announced general availability of Microsoft Fabric Open Mirroring within its Intelligent Data Management Cloud (IDMC) and launched a new Azure‑based IDMC delivery point in Switzerland. The Open Mirroring feature lets customers synchronize data between OneLake and Fabric Data Warehouse...

Master the 10 Essential Clustering Techniques
The 10 types of clustering that all data scientists need to know. Let's dive in:
CollectForU Expert and Debt Hunter Reveal 70% of Hong Kong SMEs Lack Credit Defenses
Credit‑management firms CollectForU Expert and Debt Hunter released a joint report on March 16 showing more than 70% of Hong Kong SMEs lack solid credit‑defense mechanisms, leaving them vulnerable to liquidity strain. The study flags the 90‑day delinquency mark as...

Interview: Huy Dao, Director of Data and Machine Learning Platform, Booking.com
Booking.com’s data and machine‑learning platform, led by Huy Dao, has completed a seamless migration from on‑prem Hadoop to a Snowflake‑based cloud ecosystem. The new Booking Data Exchange serves over 1,500 practitioners, handling petabytes of data and billions of daily predictions...
Dagster: Asset‑First Orchestration Over Task‑Centric Pipelines
Dagster has a steep learning curve but a payoff. It is Vim for orchestration. The mental model shift: Dagster thinks in assets, not tasks. You define what data should exist, not what steps to run. The engine figures out dependencies and...
Databricks' Genie Code Automates Data Science and Engineering
I shared my thoughts with @Infoworld on the new Genie Code from @Databricks https://t.co/54nQ6q4vAQ The goal is to highly automate data science and engineering tasks.
SAPinsider Las Vegas: Why Data Strategy Must Start With Trust:
At SAPinsider Las Vegas 2026, Ingo Hilgefort warned that data‑driven AI projects fail when organizations lack trust in their data. He argued that inconsistent definitions and poor governance cause users to rebuild dashboards to verify numbers, stalling analytics adoption. Hilgefort...

How a Nonprofit Transforms Data with Cloudera and AI
Rare Hope, a nonprofit focused on rare‑disease hypotheses, adopted Cloudera’s hybrid data‑and‑AI platform to turn unstructured research papers and medical images into structured insights. Using PySpark pipelines, the organization extracts disease‑drug correlations and feeds them to large language models for...

Federal AI Needs a New Data Foundation. Dell’s Platform Is Built for It.
The federal government is accelerating its adoption of generative AI, retrieval‑augmented generation, and early agentic systems, but agencies are constrained by legacy data architectures. Dell’s AI data platform offers a secure, federated foundation that lets classified and regulated data remain...

Taming the IoT Firehose: How Utilities Are Scaling Cloud DataOps for Smart Metering
Utilities are grappling with an "IoT firehose" as smart meters generate massive, continuous telemetry streams. To tame the volume, they are adopting cloud‑based DataOps frameworks that automate ingestion, normalize data, and deliver analytics‑ready datasets at scale. Automated, event‑driven pipelines enable...
Universal Semantic Layer Needed for Multi-Tool Data Access
The semantic layer isn't new. SAP BusinessObjects had one in 1991. What's new is the need for a universal semantic layer that works across BI tools, notebooks, and applications. When you only had one BI tool, that tool's semantic layer was enough....

Microsoft Promises All-in-One Database Wrangling Hub on Fabric
Microsoft unveiled Database Hub, an early‑access tool built on the Fabric data platform that consolidates management of Azure SQL Server, Cosmos DB, PostgreSQL, MySQL, Azure Arc‑enabled SQL, and other services. The hub offers a single pane of glass for on‑premises,...

Lloyd's Register, OneOcean Report Warns Shipping Must Master Data to Remain Competitive
Lloyd’s Register and OneOcean released a report warning that the maritime sector’s surge in operational data is hampered by fragmentation and low standardisation, jeopardising compliance and commercial advantage. Their Digital Maturity Index shows data standardisation at 2.45 / 4 while overall digital...

Oracle Announced the General Availability of Oracle Analytics Server 2026
Oracle announced the general availability of Oracle Analytics Server 2026, delivering a suite of enhancements aimed at boosting adoption, performance, and governed self‑service. New defaults for the "Limit Values By" filter and a redesigned State menu streamline workbook interactions. The...

DuckDB, AI, and the Future of Data Engineering
In this episode, Dan Beach chats with State Farm staff engineer Matt Martin about his journey from industrial engineering to data engineering, his deep involvement with DuckDB, and the evolving landscape of data platforms. Matt shares how early automation with...

Nvidia GTC 2026: DDN Launches IndustrySync Pipelines for Financial Services and Life Sciences AI
DDN announced IndustrySync Pipelines, pre‑integrated AI data workflows for Financial Services and Life Sciences, deployable on its HyperPOD platform in days instead of months. The Financial Services pipeline promises up to 150× faster risk simulations and five‑minute risk metric refreshes,...

DataOps Engineers: The Underrated Backbone of AI Efficiency
The most underrated AI role right now: DataOps Engineer. Not the ML engineer. Not the data scientist. The person who designs automation and testing infrastructure that makes everyone else dramatically more effective. Infrastructure that runs without you. That's the whole job. https://t.co/Cng5iC1BEB

GHD Appoints David McLaren to Lead Data and AI Capabilities Globally
GHD has appointed David McLaren as its Enterprise Data & AI Leader, based in Toronto. McLaren brings experience from Coca‑Cola Canada Bottling, where he built enterprise‑scale data platforms, automation and governance. At GHD he will steer the development of an...
Nigerian Firms Chase Data Analytics Skills as 8% Revenue Boost Spurs Demand
Nigerian companies are rapidly adopting data analytics, motivated by research showing an average 8% revenue increase for firms that use analytics tools. The shift is creating a talent crunch as businesses, from banks to retailers, scramble to upskill staff and...

Data Lineage Documentation Matters for Enterprise Reliability
Enterprises are increasingly recognizing that knowing where data resides is insufficient without visibility into its lifecycle. Data lineage—tracking origin, transformations, and access—provides the transparency needed for accountability, data quality, compliance, and reduced technical debt. The article highlights how poor lineage...
Ibrar Ahmed: RAG With Transactional Memory and Consistency Guarantees Inside SQL Engines
Current retrieval‑augmented generation (RAG) systems were built for static document search, which creates consistency problems when multiple agents write concurrently. Without transactional control, memory updates can become partially committed, leading to answer drift and silent corruption. The article proposes using...
Nvidia‑Backed Starcloud Seeks FCC Approval for 88,000‑Satellite AI Data Center Constellation
Redmond‑based Starcloud, a Nvidia‑backed startup, filed an FCC application on March 16, 2026 to deploy up to 88,000 low‑Earth‑orbit satellites that would act as orbital data centers for AI workloads. The plan envisions a dusk‑dawn, sun‑synchronous constellation operating between 600...
Nvidia Unveils Groq 3 Inference Chip to Power Multi‑Agent AI at GTC 2026
On March 16, 2026 at its GTC conference in San Jose, Nvidia announced Groq 3, a dedicated inference processor built on technology licensed from Groq Inc. The chip arrives in 256‑LPU LPX server racks with 128 GB of solid‑state RAM and 40 PB/s...
Nvidia Unveils $1 Trillion AI Roadmap, Vera CPUs & BlueField‑4 Storage at GTC 2026
On March 16, 2026, Nvidia CEO Jensen Huang announced at the GTC developer conference in San Jose that the company expects $1 trillion in AI chip orders through 2027, unveiled the Vera Rubin CPU/GPU platform, and introduced the BlueField‑4 STX reference...
IBM Finalizes $10 B Confluent Deal, Making Real‑Time Data Core of Enterprise AI
On March 18, 2026, IBM announced the completion of its $10 billion acquisition of data‑streaming platform Confluent, cementing the deal in the United States. The transaction gives IBM full ownership of Confluent’s Apache‑Kafka‑based technology, which IBM says will become the engine...

Intelligence and Interoperability: Data Catalog Must-Haves for AI Data Governance
Enterprises must move beyond static data catalogs toward a universal AI catalog that combines a business‑friendly semantic layer with cross‑platform interoperability. The semantic layer supplies machine‑readable context, preventing misinterpretations by AI agents, while universal interoperability ensures governance, security, and metadata...

IBM Joins Data Platform Race with Confluent Acquisition
With the latest acquisition of Confluent by IBM, they follow up on the Fivetran, Databricks, and Snowflake stack. Or what do you think? With the latest acquisition in data engineering, it's a race of who gets the most complete data platform...
Orchestration Turns Data Stack Flexibility Into Cohesion
The Modern Data Stack promised best-of-breed tools that work together seamlessly. The paradox: the more tools you pick, the more integration work you create. One perspective I find helpful: Orchestration as the connective tissue. A good orchestrator doesn't just schedule jobs -...

Datadobi Announces Early Access Program for Data Access Review
Datadobi has launched an Early Access Program for Data Access Review, a new permissions‑intelligence capability for its StorageMAP platform. The feature adds visibility into who can access unstructured data, helping organizations spot excessive, outdated, or inappropriate rights. Selected current StorageMAP...

IBM Acquires Confluent to Power Real‑time Enterprise AI
.@IBM Completes Acquisition of Confluent, Making Real Time Data the Engine of Enterprise AI and Agents https://t.co/QqwqJPCT4P >> Congrats. A key augmentation for the IBM AI capabilities. Good news for customers. #NextGenApps https://t.co/aCKH7wuesW

Databricks, Accenture Launch Joint Business Venture Focused On Spurring AI Development
Databricks and Accenture have launched the Accenture Databricks Business Group, a joint venture designed to accelerate enterprise adoption of the Databricks Data Intelligence Platform for AI and data workloads. Backed by more than 25,000 Databricks‑trained professionals, the group will help...

Agentic AI Is Forcing Analytics and Operations to Converge
Investments in data platforms have shifted from siloed warehouses to unified, sovereign foundations as agentic AI collapses analytics, operations, and AI into single workflows. Enterprises now need platforms that govern operational execution, high‑concurrency analytics, and AI reasoning together, rather than...
Better Cotton Funds On-Farm Data-Collecting Project
The Better Cotton Initiative (BCI) is launching a $200,000 on‑farm data‑collection effort in partnership with the Soil Health Institute and ag‑tech provider Growers Guide. The program will analyze soil, plant tissue and sap samples across the Southeast and other Cotton Belt...

Big Changes in Latest GigaOm Unstructured Data Management Radar Report
GigaOm released version 6 of its Unstructured Data Management Radar, expanding the vendor set to 23 and appointing James Brown as the new analyst. The report reclassifies 11 suppliers as leaders and 12 as challengers, with notable moves such as Panzura shifting...

Day 44: Real-Time Monitoring Dashboard with Kafka Streams
The post walks through building a production‑grade real‑time monitoring dashboard that ingests over 40,000 events per second using Kafka Streams. It shows how windowed aggregations, percentile calculations, and anomaly detection run on RocksDB‑backed state stores with exactly‑once guarantees. The stream...
Noémi Ványi: We Skipped the OLAP Stack and Built Our Data Warehouse in Vanilla Postgres
Xata built a product analytics warehouse using vanilla Postgres, consolidating identity, usage, billing, and event data from four separate systems. They employed materialized views, pg_cron schedules, and database branches to flatten JSONB events, refresh data daily, and iterate safely on...
Visualizing the World with Planetary Computer
Microsoft’s Planetary Computer offers a free, standards‑based geospatial data platform that aggregates curated datasets from government, academic and commercial sources. It provides STAC‑compatible APIs, Python and R SDKs, and an Explorer UI for rapid prototyping of environmental applications such as...

Coles Sets up Standard Data Streaming Platform Groupwide
Coles Group has deployed an enterprise‑wide data streaming platform built on Confluent Cloud, unifying its real‑time data pipelines under a single Apache Kafka foundation. Previously, isolated event‑streaming stacks created silos, inconsistent models, and governance challenges. The new "enterprise event platform"...
IBM, Nvidia Tackle AI Data Woes
IBM expanded its partnership with Nvidia at GTC 2026 to address enterprise AI data management challenges. The collaboration integrates Nvidia’s cuDF toolkit with IBM’s Presto query engine and adds Nemotron models to IBM’s Docling PDF reader. Nvidia GPUs will also power...

Free Datasets + LLM Queries on Snowflake, BigQuery
Snowflake and BigQuery have free datasets you can use to practice SQL with real data. Even better: LLMs are integrated, so you can query in natural language.
AI Adoption Demands Stronger, More Responsive Data Foundations
As AI moves to core operations, pressure on the data layer also intensifies. I canvassed leaders on the work required to build a well-functioning data environment responsive to today’s AI initiatives. (My latest in Database Trends) https://t.co/X8ar2pKnTZ @BigDataQtrly

Nvidia Plans to Make All Unstructured Data Structured
Nvidia announced a plan to structure hundreds of zettabytes of unstructured data each year, turning it into the ground‑truth foundation for artificial intelligence. The initiative relies on confidential computing, ensuring that even the platform operator cannot view the raw data....
Online Feature Store for AI and Machine Learning with Apache Kafka and Flink
Wix.com has built a real‑time online feature store using Apache Kafka and Apache Flink to power personalized recommendations for its 200 million users. The architecture streams over 70 billion events per day through 50 000 Kafka topics, with FlinkSQL performing low‑latency transformations and...