
Immuta launches Agentic Data Access module for AI agents
Immuta unveiled an Agentic Data Access module that lets autonomous AI agents retrieve enterprise data in real time while enforcing governance policies. The module treats agents as first‑class data users, applying least‑access and zero standing privileges and providing audit trails, all built on Immuta’s policy engine.

Nvidia announced a plan to structure hundreds of zettabytes of unstructured data each year, turning it into the ground‑truth foundation for artificial intelligence. The initiative relies on confidential computing, ensuring that even the platform operator cannot view the raw data. Partnerships with Google Cloud, IBM and Nestlé showcase GPU‑accelerated pipelines that cut processing time from 15 minutes to three minutes and deliver up to 83% cost savings. By converting massive data volumes into structured formats, Nvidia aims to unlock new AI‑driven value across industries.
Wix.com has built a real‑time online feature store using Apache Kafka and Apache Flink to power personalized recommendations for its 200 million users. The architecture streams over 70 billion events per day through 50 000 Kafka topics, with FlinkSQL performing low‑latency transformations and...

We just turned on some new smarts in the @googlecloud BigQuery Studio interface. Now you get context-aware query editing (sees open query tabs), better resource discovery through natural language questions, and smarter troubleshooting. https://t.co/9ekJhzv0Ki https://t.co/SNntL6X6bB

Polars’ new streaming engine offers a single‑node, Rust‑based alternative to heavyweight distributed frameworks like Spark. By applying lazy query optimisation and batch‑wise materialisation, it delivers low‑latency ETL pipelines while dramatically cutting hardware costs. Early adopters have swapped Spark jobs for...
In this episode, Tim talks with Gunnar Morling, a principal technologist at Confluent and a key contributor to projects like Hibernate and Debezium, about his "One Billion Row Challenge"—a viral coding contest he launched for the Java community in January...

𝐒𝐭𝐞𝐩-𝐛𝐲-𝐒𝐭𝐞𝐩 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫 𝐑𝐨𝐚𝐝𝐦𝐚𝐩 (2026 𝐄𝐝𝐢𝐭𝐢𝐨𝐧) Most people try to become Data Engineers by collecting courses. That rarely works. What you actually need is a sequence a progression that builds real capability. Here’s a practical 6-stage roadmap that takes you from foundation → job-ready 👇
I think about data modeling patterns in four main categories: 1. Dimensional modeling (Kimball) - optimized for queries 2. Data Vault - optimized for auditability and change 3. One Big Table - optimized for simplicity 4. Medallion Architecture - optimized for incremental refinement No pattern...

The post walks readers through turning a complex, distributed log‑processing stack—collectors, RabbitMQ, query engines, and storage—into a single Kubernetes deployment. By providing complete manifests, it shows how to launch the entire ecosystem with one command, while Kubernetes handles health checks,...

SELECT, FROM, WHERE and JOINs will get you started. Then the work gets complicated and you realise tutorial SQL and production SQL are two very different things. Here's level 2 CTEs — readability I was lost in my own nested subqueries. Couldn't follow...
The authors introduce data repurposing as the practice of applying existing datasets to tasks that were not envisioned at collection time. They differentiate repurposing from traditional data reuse, emphasizing new analytical goals and contextual shifts. A structured framework is presented,...

A real AWS Data Science pipeline looks like this: Raw data → S3 ETL → AWS Glue Query → Athena Training → SageMaker Deployment → Endpoints Monitoring → CloudWatch Add streaming with Kinesis and orchestration with Step Functions, and you have a full production ML platform. This is...
Though the US military's big data initiative Maven has sped up the planning of strikes for years, the comments suggest that generative AI is now adding a new interpretative layer to such deliberations.

digna announced a twelve‑month enterprise data‑warehouse deployment that operated without any traditional, manually coded data‑quality rules, relying instead on AI‑driven anomaly detection. The platform replaced thousands of null checks, threshold controls, and custom SQL assertions with statistical learning models that...
Data lineage traces your data's journey from source to destination. Where did this number come from? What would break if I changed this table? Who's using this data? Good lineage answers these questions. Bad lineage makes you grep through code. Tools like dbt...

Sema4.ai announced the general availability of its AI‑powered Semantic Layer at the Gartner Data & Analytics Summit 2026. The platform lets business users query databases, spreadsheets and documents using plain English, eliminating the need for SQL expertise. It couples a...
Have you tried out pipe syntax instead of traditional SQL? I've only messed around with it a bit. I can see how it's an improvement for different types of queries. This post shows you how to try it out (at no...

Berlin‑based Tower announced a €5.5 million raise across pre‑seed and seed rounds, led by DIG Ventures and Speedinvest. The startup offers a unified storage‑compute platform that lets data engineering teams retain full data ownership while accelerating AI‑driven pipeline development. Leveraging Apache...

Companies are rapidly expanding analytics and AI capabilities, but a new Info‑Tech Research Group study reveals that low data trust is throttling expected business value. Fragmented ownership, inconsistent validation and reactive cleanup dominate current data practices, leading to underperforming analytics...

The post outlines a production‑grade state management layer built on Kafka log‑compacted topics, featuring a keyed state producer, a consumer that materializes current snapshots, and a Redis‑backed query API. By retaining only the latest record per entity key, log compaction...

The article demonstrates how to use the sqlpackage command‑line utility to detect schema drift between Azure SQL databases by comparing a DACPAC file against a target database and generating a delta script. It outlines a lightweight, scriptable workflow that avoids...

Big data delivers eight strategic benefits for businesses, from deeper customer insight to real‑time decision making. By integrating diverse data sources—clickstreams, sensor feeds, social media—companies can personalize experiences, sharpen market intelligence, and streamline supply chains. Advanced architectures like lakehouses enable...

Denodo announced the release of Platform 9.4, a logical data management solution designed to accelerate trusted AI across enterprises. The update adds native vector‑database connectivity, embeds the Model Context Protocol for governed AI data access, and introduces a Lakehouse Accelerator powered...

Alation has launched outcome‑based governance, a system that replaces manual data‑governance processes with an agent‑driven operating model. The new Curation Automation feature, now generally available, automatically enriches and enforces metadata standards across the Alation platform. Organizations can declare business outcomes—such...

Cole Bowden’s DBTA webinar warned against over‑engineered data stacks and advocated a pragmatic approach to time‑series workloads. He urged firms to first assess whether data fits in memory or on a single drive before adopting a specialized database. When scale...

Big news, I added a new design pattern chapter, called «Dynamic Query Design Pattern». This design pattern problem statement goes like this: 1. Provide immediate answers 2. How you model it matters 3. Dumping everything into the lake is painful The core challenge is enabling...
"Shift left" comes from software engineering - finding bugs earlier in the development process. In data, shifting left means: validate data at the source, not after it breaks your dashboard. Instead of hoping bad data doesn't show up in your warehouse, you...
A unified, domain‑aware anomaly detection pipeline maps retail transaction and network traffic streams to a common event schema, enabling real‑time monitoring of rare, high‑impact events. The approach extracts temporal features (e.g., time‑since‑last‑event) and contextual typicality without data leakage, then trains...
RT You can't have responsible AI without responsible data. Classify AI data, extend zero trust, encrypt in use, and spell out non-negotiable governance policies from day one. #AISecurity #DataGovernance @Star_CIO https://t.co/aiB5P99ido
SAP's BTP platform streamlines cloud migrations by offering tools for data quality, master data management, and analytics. It supports a 'clean core' approach, enabling organizations to differentiate with custom processes without complex upgrades. #SAP #CloudMigration #BTP https://t.co/94wouRrLLt

ECDB, founded in 2022, delivers transaction‑level e‑commerce market intelligence by processing more than 1 billion purchases each month—about 1‑2% of global online sales. Its platform normalises and enriches this data to provide near‑real‑time visibility across categories, retailers and markets. Retailers using...
SQLMesh takes dbt's concept and adds semantic understanding of SQL. It parses SQL statements, translates between dialects automatically, and offers compile-time validation. Built by Tobiko Data (now Fivetran). If you're starting fresh, it deserves serious consideration. https://www.ssp.sh/brain/sqlmesh
The article proposes governing real‑world health data as a public utility, using federated, standards‑based, community‑driven models to overcome fragmentation, proprietary control, and weak oversight. It cites ARPA‑H’s interest in economic models and highlights existing distributed networks and research enclaves as...
Birdzi introduced AskKea, a generative business intelligence assistant that lets grocery retailers ask plain‑English questions and receive decision‑ready answers, visualizations, and exportable data within minutes. The tool integrates structured and unstructured retail data, offering cross‑system insights such as category penetration,...

Rob Moffat tested Claude Code’s ability to generate a full dbt project for UK flood‑monitoring data. Using a concise prompt, the model produced a complete project structure, passed all dbt tests, and even fixed its own build errors. However, the...

DataStrike announced an expansion of its Microsoft Fabric services, targeting organizations that are adopting the unified analytics platform. The new portfolio includes a two‑week Fabric readiness and proof‑of‑concept engagement, end‑to‑end migration assistance, and 24/7 managed operations. Services span OneLake, lakehouse...
At HIMSS26, Sequoia Project’s Didi Davis unveiled the second USCDI v3 data‑usability guide, expanding coverage to all data classes and emphasizing provenance, traceability, and persistent identifiers. The 60‑page guide outlines use cases across provider‑to‑provider, provider‑to‑public‑health, and provider‑to‑consumer exchanges, aiming to curb...
I had a good friend tell me the apps I created were trash because I didn't have a formal database. I took that as a challenge. In 5 hours, I personally built, populated, and deployed a database containing all the content...
Graph databases are emerging as essential infrastructure for enterprise AI, offering a way to map relationships that reduces hallucinations, improves explainability, and enforces data governance. Neo4j’s CEO Emil Eifrem highlights that knowledge graphs give LLMs transparent access to corporate data,...

Data lakes often start as simple repositories but evolve into unmanaged dumping grounds as teams drop files without documentation or ownership. N‑iX consulting recommends a focused refresh that begins with the most‑used datasets, assigns clear owners, separates raw and curated...

The blog announces a natural language query engine for log platforms, letting users ask questions like “show me errors from payment service in the last hour” and receive instant results. By converting conversational intent into optimized SQL, the system removes...

In this re‑aired episode, hosts Eric Dotz and John Wessel chat with regular guest Matt, the Cynical Data Guy, about the rise of low‑code data tools like Clay and the evolving role of the “GT‑M engineer.” They debate whether such...
A PostgreSQL production cluster was killed by the OOM killer after a single query consumed 2 TB of RAM, despite work_mem being set to only 2 MB. The investigation revealed that the query’s ExecutorState memory context retained hundreds of thousands of work_mem‑sized...
The Sovereign Data Supply Chain: Functional and Operational Framework version 1.0 proposes a structured governance model for data originating from indigenous and local territories. It aims to replace extractive data practices with sovereign, rights‑based chains across Latin America and the Caribbean....
Enterprise commerce brands are losing speed and revenue because data remains trapped in siloed systems, creating integration gaps that delay decisions and cause inventory errors. Gartner predicts 94 % of CIOs will overhaul strategies within two years, yet less than half...

The Internal Revenue Service has issued a fast‑track sources‑sought notice for a new Business Intelligence Platform to collect, research and validate corporate and partnership taxpayer data. The contract will cover one base year and up to four option years, providing...

Clean, reliable data is the foundation of any effective CRM, yet most organizations watch their records degrade as leads flow in, updates occur, and integrations sync. Manual de‑duplication and field fixes are slow, error‑prone, and unsustainable at scale. Leveraging ETL...
A fintech audit platform replaced its monolithic HBase + Elasticsearch stack with a lakehouse built on Apache Iceberg, Parquet, and Spark Structured Streaming. Data is ingested from Kafka every five minutes, written to Iceberg tables, and queried via Apache Doris for low‑latency...

Precisely announced an OEM partnership with Matillion to embed cloud‑native ETL capabilities into its Data Integrity Suite. The integration adds low‑code, scalable transformation and pipeline automation to Precisely’s existing data quality, governance, and enrichment services. By unifying extraction, transformation, and...

Hammerspace announced a partnership with Secuvy to deliver a “Data‑First” integration that unifies unstructured data across edge, on‑premises, and multi‑cloud environments. The joint solution creates a global namespace, continuously discovers, classifies, and catalogs data, and applies policy‑driven security without copying...

Databahn, an AI‑native data pipeline platform, reported rapid enterprise traction, with Fortune 500 customers now representing over half its base. The company posted more than 400% year‑over‑year revenue growth and nearly 200% of its ARR coming from retained customers. Growth is...