Big Data Social Media and Updates

From Intelligence to Action: Rethinking the Data Stack
SocialMay 7, 2026

From Intelligence to Action: Rethinking the Data Stack

.@Google’s take: the modern data stack isn’t failing—it’s misaligned. Built for humans answering questions. Agents need systems that execute decisions—continuously, autonomously, at scale. The shift: from systems of intelligence → systems of action. That’s the real bet behind Agentic Data...

By Holger Müller
Jay Kreps Traces Kafka’s Birth and Confluent Foundations
SocialMay 5, 2026

Jay Kreps Traces Kafka’s Birth and Confluent Foundations

During @IBM Think day 1, keynote 3, @confluentinc and @apachekafka co-founder Jay Kreps explains how Kafka came about and how it became basis of Confluent. #Kafka #IBMThink #Think #Think2026 @robdthomas @ArvindKrishna @furrier @dvellante @dhinchcliffe @holgermu https://t.co/cb9TDjuQG2

By Sarbjeet Johal
SAP Expands Data Platform with Dremio, Prior Labs Acquisitions
SocialMay 4, 2026

SAP Expands Data Platform with Dremio, Prior Labs Acquisitions

.@SAP acquires @Dremio, @PriorLabs as it builds out its data platform plan https://t.co/78ISBnRjAP SAP said it will acquire Dremio, an open data lakehouse player, in a move that aims to use SAP Business Data Cloud combine SAP data with non-SAP...

By Holger Müller
SAP Bolsters AI Data Layer with Dremio, Prior Labs
SocialMay 4, 2026

SAP Bolsters AI Data Layer with Dremio, Prior Labs

@SAP to acquire @dremio and @prior_labs ... the data landscape for SAP AI is quickly evolving. My Take: 3 Positives + Good to see SAP working on the data foundation that powers Agentic AI. + Tabular data is essential for enterprise...

By Holger Müller
Unstructured Data Security Hindered by Visibility Gaps
SocialMay 4, 2026

Unstructured Data Security Hindered by Visibility Gaps

RT One of the biggest security challenges with unstructured data is the lack of visibility and lineage as information moves across systems, clouds, and teams." #DataLineage #AI @Star_CIO https://t.co/PYomJYHDkY

By Isaac Sacolick
Context, Not Models, Drives Generative BI Success
SocialMay 4, 2026

Context, Not Models, Drives Generative BI Success

Generative BI doesn’t fail because of models. It fails because of missing context. OpenBI solves that. → one standard → any platform → governed outputs From answers to certified BI assets. That’s the real shift ⚡ 📌 Part 4/4 on Generative BI https://t.co/rzlsMalE4X 🚀 Meet Databreeze at AI Week Milano (May...

By Giuliano Liguori
Synthetic Data Enables Safe Workforce Evolution Simulations
SocialMay 3, 2026

Synthetic Data Enables Safe Workforce Evolution Simulations

Use Synthetic #Data to Simulate Workforce Evolution Without Exposing Employee Data by @antgrasso #DataScience #BigData https://t.co/p4BdPIKlSB

By Ron van Loon
Agents Disrupt Database Contracts; Build Defensive Data Layers
SocialMay 1, 2026

Agents Disrupt Database Contracts; Build Defensive Data Layers

Have agents broken the unspoken contract we had with our databases? You know, human-authored apps running deterministic code and predictable queries? @arpit_bhayani wrote a terrific post for how to create a defensively designed data layer ... https://t.co/8OOXrmkD6g

By Richard Seroter
Centralize Metrics with a DRY Analytics API
SocialMay 1, 2026

Centralize Metrics with a DRY Analytics API

Instead of duplicating measures in each BI tool, store them centrally in an Analytics API. This is the DRY principle applied to metrics. One metric definition, accessible via GraphQL, SQL, or REST. https://www.ssp.sh/brain/analytics-api

By SSP Data
Data Migration Remains Underestimated Despite Persistent Industry Challenges
SocialMay 1, 2026

Data Migration Remains Underestimated Despite Persistent Industry Challenges

Data Migration Is Still Hard: Why the Industry Keeps Underestimating It "Every few years the IT industry rediscovers something that experienced practitioners already know: moving data is difficult." https://t.co/Qy0esoSAIR

By Dez Blanchfield
Turning Data Chaos Into Business‑Driving AI on Google Cloud
SocialMay 1, 2026

Turning Data Chaos Into Business‑Driving AI on Google Cloud

From Data Chaos to Confident AI: How Overdose Built a Semantic Foundation on Google Cloud https://t.co/MEyCiKRO4Q Overdose CEO Paul Pritchard unpacked exactly what it takes to get from scattered #data sources to #AI that actually moves the business forward.

By Holger Müller
Boomi Shifts to Data Activation, Tackles AI Semantic Gap
SocialApr 30, 2026

Boomi Shifts to Data Activation, Tackles AI Semantic Gap

I haven't written about @boomi in a while. That changed when I noticed they stopped calling themselves an integration company sometime in Q1 2026. The new label is "data activation company" and it's not cosmetic. Integration Platform as a Service...

By Shashi Bellamkonda
Grant Is the Proper Term for Permission on Entities
SocialApr 30, 2026

Grant Is the Proper Term for Permission on Entities

Randomly remembered the time when Grant Henke worked on AuthZ for Apache Kafka and tried to come up with a term for "giving a user permission to perform action on an entity". Grant, it is called "grant". As in: "GRANT...

By Gwen (Chen) Shapira
83% Predict AI Demand Will Outpace Data Infrastructure Soon
SocialApr 30, 2026

83% Predict AI Demand Will Outpace Data Infrastructure Soon

At least 83% of the 1,125 tech managers surveyed by Cockroach Labs agree that AI demand will exceed the capacity of most data infrastructure in the next 12–18 months. (My write-up in DBTA) https://t.co/H9Zchu6dgC

By Joe McKendrick
Replace Patterns in a Single Column Using Command‑line Tools
SocialApr 29, 2026

Replace Patterns in a Single Column Using Command‑line Tools

1/Need to replace a pattern, but only in column 5? Don't touch the rest of the file. Don't reach for Excel. Here’s how real data wranglers do it. https://t.co/MwsbgOKyVD

By Ming Tang
Claude Enables Unified Queries Across Structured and Unstructured Data
SocialApr 29, 2026

Claude Enables Unified Queries Across Structured and Unstructured Data

Using Claude to query the context layer and data fabric including unstructured documents. #AppianSummit #AI https://t.co/5gozb1CTrJ

By Isaac Sacolick
Data Fabrics and MCP Must Provide Clear AI Context
SocialApr 29, 2026

Data Fabrics and MCP Must Provide Clear AI Context

Michael Beckley - data fabrics and mcp must deliver clear context to AI agents. #AppianWorld #AI #CIO https://t.co/GOrMSwUFXy

By Isaac Sacolick
Custom SAP S/4HANA Tweaks Hide Major Risk
SocialApr 29, 2026

Custom SAP S/4HANA Tweaks Hide Major Risk

Public filings reveal a disconnect: 98% data cleanliness and fitting SAP S/4HANA's standard functionality sound great, but the devil's in the details. Customizations, not standard features, often introduce hidden risks and complexities. #DataGovernance #SAP #ProjectManagement https://t.co/s1WdakdrgC

By Eric Kimberling
Governance, Not AI, Drives Triple ROI for Leaders
SocialApr 29, 2026

Governance, Not AI, Drives Triple ROI for Leaders

This week, SAS marks 50 years. Bryan Harris opened the conference with a single frame: SAS was built to close the information gap, the gulf between data and the human capacity to act on it. AI is finally powerful enough...

By Sabine VanderLinden
AI Failures Reveal Data Quality as Critical Priority
SocialApr 29, 2026

AI Failures Reveal Data Quality as Critical Priority

Data Quality Assurance broke out this week. Not because anyone got excited about data quality. Because AI deployments started failing. The models were fine. The data wasn't. 18 months later, every serious team is circling back to the unsexy work. https://t.co/6W59KI6Zhb

By Yves Mulkers
AI Agents Poised to Dissolve Data Silos with AWS
SocialApr 28, 2026

AI Agents Poised to Dissolve Data Silos with AWS

Will AI agents finally break down data silos? I’m "relatively” optimistic. Jigar Thakkar is talking about how to do this with Amazon Quick #WhatsNextWithAWS #AWS https://t.co/SnP4khetyO

By Maribel Lopez
Amazon Quick Unifies Enterprise Data Across Cloud
SocialApr 28, 2026

Amazon Quick Unifies Enterprise Data Across Cloud

First announcement: Amazon Quick. A solution to bring enterprise data together. #CIO #AI #Cloud #WhatsNextWithAWS https://t.co/tgebOMT351

By Tim Crawford
Google's LangExtract: Free, Open‑Source Tool Beats $100K Solutions
SocialApr 28, 2026

Google's LangExtract: Free, Open‑Source Tool Beats $100K Solutions

RIP document extractors. Google just released LangExtract: Open-source. Free. Better than $100K enterprise tools. Here’s what it does: 🧵

By Matt Dancho
Bitemporal Modeling: Managing Data Across Valid and Transaction Times
SocialApr 28, 2026

Bitemporal Modeling: Managing Data Across Valid and Transaction Times

Valid time vs transaction time. When you need both. Bitemporal modeling handles historical data along two distinct timelines. https://www.ssp.sh/brain/bitemporal-modeling

By SSP Data
Genie Becomes Databricks’ Core AI for Semantic Analysis
SocialApr 28, 2026

Genie Becomes Databricks’ Core AI for Semantic Analysis

𝐆𝐞𝐧𝐢𝐞 is now the most important way to do data analysis in Databricks. What's unique about it is its ability to extract semantics from your entire Lakehouse, enabling it to answer complex data questions that cripple agents without a deep...

By Ali Ghodsi
Metadata Is the Database; Data only Works Through It
SocialApr 27, 2026

Metadata Is the Database; Data only Works Through It

Metadata is the database. Data is just the thing metadata lets you find, interpret, route, lock, replicate, and recover.

By Gwen (Chen) Shapira
Backfilling: The Mark of a Great Data Engineer
SocialApr 27, 2026

Backfilling: The Mark of a Great Data Engineer

Backfilling is where you see the difference between a data engineer and a great data engineer. A backfill means taking a data asset normally updated incrementally and updating historical parts of it. https://www.ssp.sh/brain/backfill

By SSP Data
Structured Data Drives Accurate, Low‑Risk AI Results
SocialApr 27, 2026

Structured Data Drives Accurate, Low‑Risk AI Results

#AI is only as good as the structure behind your data. Same model. Two setups. Completely different results. Structured + metadata → higher accuracy, lower risk. Unstructured → messy, inconsistent outputs. That’s the bottleneck most teams ignore. Check out the...

By Ronald van Loon Threads
Regulators Target Hotel AI Use Over Data Privacy
SocialApr 27, 2026

Regulators Target Hotel AI Use Over Data Privacy

Regulators turn attention to hotel AI governance “Hotels are increasingly deploying AI for functions such as … dynamic pricing, and personalised marketing. These systems rely on large volumes of behavioural and transactional data, raising questions about consent, storage, and secondary...

By Glen Gilmore
AI Dismisses Data Mesh vs Fabric, Emphasizes Fundamentals
SocialApr 27, 2026

AI Dismisses Data Mesh vs Fabric, Emphasizes Fundamentals

Data mesh vs data fabric was "the battle of the storytellers." Zhamak against the fabriconians. Then AI walked in and wiped both off the map. Storytellers win the round. Fundamentals win the decade. https://t.co/NPDJeWrmaE

By Yves Mulkers
OpenAI Triples Web Crawl Scale, Study Reveals
SocialApr 26, 2026

OpenAI Triples Web Crawl Scale, Study Reveals

Interested in how OpenAI is crawling (across its user-agents)? Good stuff from Chris Long of Nectiv based on 7B+ log file entries. Some good nuggets of information in the study. https://www.botify.com/blog/openai-tripled-web-crawl

By Glenn Gabe
Dataplex Becomes a Dynamic, Always‑on Enterprise Knowledge Catalog
SocialApr 24, 2026

Dataplex Becomes a Dynamic, Always‑on Enterprise Knowledge Catalog

"To address this problem, we are evolving Dataplex into a dynamic, always-on Knowledge Catalog that serves as the universal context engine for your enterprise, helping agents execute complex tasks with accuracy." https://t.co/C1akH2b5yu < I'm listening.

By Richard Seroter
External Tables Persist: Legacy Need Meets Modern Data Access
SocialApr 24, 2026

External Tables Persist: Legacy Need Meets Modern Data Access

Have you ever thought why Databricks, BigQuery, and others are still adding features such as External Tables? I've used them when starting my career in 2003, but why are they still used today? And what are the modern versions of...

By SSP Data
BI Dashboards Are Dying; Prepare for the Next Wave
SocialApr 24, 2026

BI Dashboards Are Dying; Prepare for the Next Wave

RIP BI Dashboards. Tools like Tableau and PowerBI are about to become extinct. This is what's coming (and how to prepare):

By Matt Dancho
From Imperative to Declarative: Rethink Data System Design
SocialApr 24, 2026

From Imperative to Declarative: Rethink Data System Design

Not just syntax. A fundamental shift in how you think about data systems. Imperative: dictate exact steps. Declarative: describe what you want, system figures out how. https://www.ssp.sh/brain/declarative-vs-imperative

By SSP Data
Unstructured Data Security Hindered by Visibility Gaps
SocialApr 24, 2026

Unstructured Data Security Hindered by Visibility Gaps

One of the biggest security challenges with unstructured data is the lack of visibility and lineage as information moves across systems, clouds, and teams." #DataLineage #AI https://t.co/PYomJYHDkY

By Isaac Sacolick
Data Trust and Transparency Demand Urgent Reform
SocialApr 23, 2026

Data Trust and Transparency Demand Urgent Reform

A cracking podcast with @MrMBrown and @DamianPudner of @GreatBritishTT talking Data: Trust, Transparency, and the Need for Reform https://t.co/fOQA35x2rN

By Michael Hewson
Governance Success Starts with People, Not Tools
SocialApr 23, 2026

Governance Success Starts with People, Not Tools

Most data governance projects fail because they start with the tool. Not the process. Not the people. Colruyt Group flipped that: they aligned roles, tested the model, then scaled In the AI era, governance isn’t control—it’s enablement. by @ronald_vanloon, Grimme Bogaerts & Martijn...

By Ronald van Loon Threads
OpenDataLoader PDF: Trending Parser Boosts RAG Pipelines
SocialApr 23, 2026

OpenDataLoader PDF: Trending Parser Boosts RAG Pipelines

Someone just open-sourced a PDF parser that converts documents to Markdown, JSON, and HTML — and it's currently the #1 trending repository on GitHub. It's called OpenDataLoader PDF. Here's why data scientists building RAG pipelines should pay attention.

By Matt Dancho
Simple Fixes Like Quantization Beat Proprietary Hype
SocialApr 22, 2026

Simple Fixes Like Quantization Beat Proprietary Hype

Once again I learn that: - Everyone is aware of pgvector limitations (filters, perf when doesn't fit in memory) - Vendors use this to advocate for proprietary alternatives - Almost no one talks about the simple solutions: Quantization / halfvec, partitions, partial indexes.

By Gwen (Chen) Shapira
Google Cloud Spanner Omni: Portable, Best‑in‑Class Database Everywhere
SocialApr 22, 2026

Google Cloud Spanner Omni: Portable, Best‑in‑Class Database Everywhere

The best database on the internet is now available ... anywhere? We just announced @googlecloud Spanner Omni, with reimagined TrueTime and Colossus components for portability. https://t.co/iHBOKF7nH2 Pull the container here: "docker pull https://t.co/g26aVGW0aX" https://t.co/zqU7Z1g3Lp

By Richard Seroter
No Excuse for Type Errors in 2026.
SocialApr 22, 2026

No Excuse for Type Errors in 2026.

I honestly don't get it. In a few weeks I - by myself - could build a canonical, relational, evolvable, trace-able, PII-cleansed, auditable data architecture merging external and internal data sources with strict permissioning and provenance, with agentic, context-aware retrieval,...

By Adam Butler
Data Readiness Determines AI ROI Success
SocialApr 22, 2026

Data Readiness Determines AI ROI Success

#Ad Your AI strategy is only as strong as the data behind it. Cloudera’s latest report, The Data Readiness Index 2026, shows why data access, governance, quality and infrastructure have become critical factors in determining whether AI initiatives deliver real...

By Bernard Marr
AI Transforms Governance of Enterprise Unstructured Data
SocialApr 21, 2026

AI Transforms Governance of Enterprise Unstructured Data

"Unstructured data now makes up the vast majority of enterprise information, and AI is redefining how organizations bring control, accessibility, and security to it." #DataGovernance #AI https://t.co/PYomJYHDkY

By Isaac Sacolick
Seamless Hand‑off: Stream Processing to Messaging to AI Agent
SocialApr 21, 2026

Seamless Hand‑off: Stream Processing to Messaging to AI Agent

Not always, but sometimes the pieces just fit together well. I like this post that shows a clean handoff from : Real-time stream processing --> messaging engine for data transformation --> AI agent that processes the message https://t.co/wGIr7GLVtQ https://t.co/VgaydbuoUW

By Richard Seroter
Embed Governance in Tools, Not Just Policies
SocialApr 21, 2026

Embed Governance in Tools, Not Just Policies

Most companies think data governance is about policies and committees. The ones that get it right embed governance into their tools. https://www.ssp.sh/brain/data-governance

By SSP Data
Data Governance: Fast‑Pass to Faster, Confident Decisions
SocialApr 20, 2026

Data Governance: Fast‑Pass to Faster, Confident Decisions

Data governance sounds like bureaucracy. In reality, it’s a fast-pass to faster, more confident decision-making. Here’s why it matters more than ever 👉 https://t.co/BXkcbZnc00 @DI_tweet #HIMSS26 #HITSM

By Colin Hung
Personalized PageRank Reveals High-Impact Nodes in 9M-Edge Graph
SocialApr 20, 2026

Personalized PageRank Reveals High-Impact Nodes in 9M-Edge Graph

So... "ai, explain what I just did": we ingested 9M+ directed edges from X into a weighted influence graph, then ran a personalized PageRank variant — tuned for human accounts — to surface eigenvector centrality at scale. once you know the...

By Kevin Rose
Data Lakes Gain Full ACID Guarantees Like Traditional Databases
SocialApr 20, 2026

Data Lakes Gain Full ACID Guarantees Like Traditional Databases

Normally ACID means a database. But now data lakes like Delta Lake added these features too. Atomicity, Consistency, Isolation, Durability. Simple files on S3 now have the same guarantees as Postgres. https://www.ssp.sh/brain/acid-transactions

By SSP Data