Know What's Happening in Big Data

Today's Big Data Pulse

Leadership Gaps Hamper Data Engineering Teams, Survey Finds

Three 2026 surveys of 1,629 data professionals reveal organizational issues now dominate data‑engineering bottlenecks. In January, weak leadership direction and poor requirements accounted for 40% of top‑bottleneck votes, while by April 50% cited lack of clear ownership as the biggest pain point. Legacy systems and tooling were far lower priorities, at 25% and under 5% respectively.

#353 The Data Team's Agentic Future with Ketan Karkhanis, CEO at ThoughtSpot
PodcastMar 30, 202649 min

#353 The Data Team's Agentic Future with Ketan Karkhanis, CEO at ThoughtSpot

In this episode, ThoughtSpot CEO Ketan Karkhanis discusses how AI agents are reshaping data analytics, turning self‑service BI from a long‑standing promise into a reality. He showcases ThoughtSpot’s agents—Spotter, Spotter Model, and SpotterWiz—that can answer business questions, automate data engineering...

By DataFramed
FINRA’s “Burdensome” 2.5TB Data Tamed by AWS
SocialMar 30, 2026

FINRA’s “Burdensome” 2.5TB Data Tamed by AWS

@FINRA 's MMAT recent BK response/filling (March 27th) states an estimated 2.5 terabyte of trading data for #MMAT / #MMTLP. They called this BURDENSOME. At first glance perhaps this may look like a big, scary dataset... But lets take a closer...

By George Palikaras
The Forrester Wave™: Data Quality Solutions, Q1 2026
NewsMar 30, 2026

The Forrester Wave™: Data Quality Solutions, Q1 2026

The Forrester Wave™: Data Quality Solutions, Q1 2026 reveals a decisive shift toward AI‑driven automation, real‑time observability, and multimodal data handling. Vendors now embed generative and agentic AI to profile, classify, validate, and remediate data at scale, moving beyond traditional rule‑based...

By Forrester Blogs
Chinese AI Firms Monetize Niche Markets with Advanced Data Analytics, Generating $174 B in Revenue
NewsMar 30, 2026

Chinese AI Firms Monetize Niche Markets with Advanced Data Analytics, Generating $174 B in Revenue

Beijing LLVision Technology and Ping An Insurance are turning sophisticated data pipelines into profit engines, launching AI‑powered translation glasses and early‑disease screening tools that together underpin a $174 billion AI market in China. Their niche‑focused models illustrate how Chinese firms are...

By Pulse
China Launches Nationwide Data‑Driven Spring Farming Push, While Global Big‑Data Deals Accelerate
NewsMar 29, 2026

China Launches Nationwide Data‑Driven Spring Farming Push, While Global Big‑Data Deals Accelerate

China's Ministry of Agriculture announced a country‑wide rollout of sensor‑rich, cloud‑based farming platforms for the spring season, though financial details were not disclosed. At the same time, Herbalife's $55 million acquisition of Bioniq and a $33 billion US‑Indonesia trade pact underscore how...

By Pulse
BI Dashboards Are Dying; New Tools Are Arriving
SocialMar 29, 2026

BI Dashboards Are Dying; New Tools Are Arriving

RIP BI Dashboards. Tools like Tableau and PowerBI are about to become extinct. This is what's coming (and how to prepare):

By Matt Dancho
Why Some Businesses Seem to Win Online Without Ever Feeling Like They Are Trying
NewsMar 29, 2026

Why Some Businesses Seem to Win Online Without Ever Feeling Like They Are Trying

Over 97% of companies worldwide have invested in big data, and analytics now yields an average return of $13 for every $1 spent, according to a Nucleus Research survey. The article argues that businesses that appear to win online without...

By SmartData Collective
How to Query GDELT's Dataset Using Google BigQuery
BlogMar 29, 2026

How to Query GDELT's Dataset Using Google BigQuery

OSINT Jobs released a tutorial showing how to access GDELT’s comprehensive news archive through Google BigQuery at no cost. The guide walks users through setting up the BigQuery environment, exploring the two core GDELT tables, and running a SQL query...

By The Weekly OSINT Newsletter
Spirit Crossing Devs Reveal New In‑Game Ad and Data‑Driven Monetization Roadmap
NewsMar 29, 2026

Spirit Crossing Devs Reveal New In‑Game Ad and Data‑Driven Monetization Roadmap

Spirit Crossing developers published a blog post detailing past monetization tactics and future strategies, highlighting a pivot toward programmatic in‑game advertising and AI‑powered pricing. The shift reflects broader industry moves to blend ad‑tech with traditional purchases to boost revenue and...

By Pulse
Group by Time with pd.Grouper—no Extra Columns
SocialMar 29, 2026

Group by Time with pd.Grouper—no Extra Columns

Python tip You've been creating extra columns just to group by month. pd.Grouper does it in one step, inside the groupby. Same result. No extra column. It works for any time frequency -- weekly, quarterly, custom intervals -- without touching your data.

By Karina | Python | Excel | Stats | DataScience | DataAnalytics
Data Governance Essential for Trustworthy AI in Education
SocialMar 29, 2026

Data Governance Essential for Trustworthy AI in Education

Trust In The #Digital Classroom: Why #Data Governance Must Guide #AI In Education by @geoffreyalef1 @Forbes Learn more: https://t.co/BKbfmT1JPq #EduTech #ArtificialIntelligence #DigitalTransformation https://t.co/gLBIx5UmYg

By Ron van Loon
Hubert 'Depesz' Lubaczewski: Waiting for PostgreSQL 19 – Json Format for COPY TO
NewsMar 29, 2026

Hubert 'Depesz' Lubaczewski: Waiting for PostgreSQL 19 – Json Format for COPY TO

PostgreSQL’s upcoming 19 release introduces a native JSON output option for the COPY TO command, allowing users to stream query results as line‑delimited JSON objects (NDJSON). The feature supports the syntax COPY TO … WITH (FORMAT json) and includes a force_array...

By Planet PostgreSQL
Herbalife to Spend $55 M on Bioniq Assets, Boosting Data‑Driven Nutrition Platform
NewsMar 29, 2026

Herbalife to Spend $55 M on Bioniq Assets, Boosting Data‑Driven Nutrition Platform

Herbalife Ltd. announced a $55 million acquisition of assets from UK‑based Bioniq, adding a biomarker‑powered supplement engine to its portfolio. The deal, slated to close in Q2 2026, aims to scale personalized nutrition through the company’s global distributor network.

By Pulse
Africa’s AI Future to Be Defined by Data Governance Across the Continent— Gyekye
NewsMar 29, 2026

Africa’s AI Future to Be Defined by Data Governance Across the Continent— Gyekye

Microsoft’s Africa government affairs director Akua Gyekye says the continent’s AI future hinges on effective data governance rather than just technology adoption. While 76 % of African nations now have data‑protection laws, fragmented policies and restrictive localisation impede cross‑border data flows....

By BusinessDay (Nigeria)
VDX.tv’s 90‑Day Cookie Harvest Triggers Privacy Alarm
NewsMar 29, 2026

VDX.tv’s 90‑Day Cookie Harvest Triggers Privacy Alarm

Exponential Interactive’s VDX.tv is gathering extensive personal and behavioural data through cookies that last up to 90 days, including IP addresses, device identifiers and browsing histories. The practice has ignited privacy‑governance concerns among regulators and consumer‑rights groups, highlighting the tension...

By Pulse
Chinese AI Firms Turn Niche Data Play Into Profit, XtalPi Posts $19.5M Gain
NewsMar 29, 2026

Chinese AI Firms Turn Niche Data Play Into Profit, XtalPi Posts $19.5M Gain

Chinese AI companies XtalPi and Blacklake have moved from loss‑making research to sustainable profitability by targeting specialized data‑driven markets. XtalPi reported a 134.6 million‑yuan ($19.5 million) profit in 2025, while Blacklake achieved its first profit in late 2024, underscoring a shift in...

By Pulse
Origin Raises $30 M Series A+ to Build AI‑powered Global Employee Benefits Platform
NewsMar 28, 2026

Origin Raises $30 M Series A+ to Build AI‑powered Global Employee Benefits Platform

Origin announced a $30 million Series A+ round led by Notion Capital to expand its AI‑driven benefits intelligence platform. The funding brings the startup’s total capital to more than $50 million and positions it to address fragmented benefits data for multinational enterprises.

By Pulse
Florida Senate Bill on Data Center Power Costs Lacks Public Details
NewsMar 28, 2026

Florida Senate Bill on Data Center Power Costs Lacks Public Details

The Florida Senate reportedly passed legislation forcing hyper‑scale data centers to shoulder their own electricity expenses, but none of the supplied source articles contain details on the bill, its sponsors, financial impact, or implementation timeline.

By Pulse
800ms Latency Spikes From A $45K Redis Cluster That Looked Healthy [Edition #2]
BlogMar 28, 2026

800ms Latency Spikes From A $45K Redis Cluster That Looked Healthy [Edition #2]

Fintech firm Veritas Pay, processing 800 million transactions annually, saw its real‑time fraud detection engine exceed the 150 ms SLA, with P99 latency spiking to 800 ms during peak loads. The root causes include Redis write saturation during six‑hour batch syncs, a Python...

By Machine learning at scale
Generative BI Transforms Data, but Governance Prevents Chaos
SocialMar 28, 2026

Generative BI Transforms Data, but Governance Prevents Chaos

Generative BI is not just an evolution of Business Intelligence. It’s a structural shift in how organizations think, interact, and decide with data. For years, BI promised democratization. In reality, many companies are still stuck between: 🔸 IT bottlenecks 🔸 Low data literacy 🔸 Rigid...

By Giuliano Liguori
Butterfly Network and GE HealthCare Surge on AI‑Driven Diagnostic Data Boom
NewsMar 28, 2026

Butterfly Network and GE HealthCare Surge on AI‑Driven Diagnostic Data Boom

Shares of Butterfly Network and GE HealthCare jumped sharply after investors poured into AI‑enabled diagnostic platforms. The surge reflects growing confidence that large‑scale health data and machine‑learning analytics will reshape cardiac and imaging care, while regulators and private‑equity money add...

By Pulse
Build It Yourself: A Data Pipeline that Trains a Real Model
NewsMar 28, 2026

Build It Yourself: A Data Pipeline that Trains a Real Model

The article explains what a data pipeline is, why it’s essential for AI, and provides a step‑by‑step tutorial to build a simple pipeline that simulates temperature data, trains a linear regression model with scikit‑learn, and generates predictions. It outlines the...

By The New Stack
Pandas: From
SocialMar 28, 2026

Pandas: From

Pandas is not optional anymore. It’s a core skill. Learn it. Use it. Master it.

By AWS Certified DevOps Engineer
Your Data Vendor Is Charging You $800K to Solve a $100K Problem
PodcastMar 28, 20260 min

Your Data Vendor Is Charging You $800K to Solve a $100K Problem

In this episode Camille Bank reveals how mid‑size companies are paying upwards of $800 K annually for data stacks that solve far smaller problems, exposing hidden costs in Snowflake compute, connector services like Fivetran, BI tools, and the salaries of multiple...

By AI Adopters Club
Use Python Set Operators to Compare Lists Instantly
SocialMar 28, 2026

Use Python Set Operators to Compare Lists Instantly

Python set operators analysts actually use You already know sets remove duplicates. But they also do something more useful. Compare lists without a single loop. | union -- combine two lists, no duplicates. i.e. all customers who bought in January OR February & intersection...

By Karina | Python | Excel | Stats | DataScience | DataAnalytics
GitHub to Train Copilot Models on User Data, Sharing Results with Microsoft
NewsMar 28, 2026

GitHub to Train Copilot Models on User Data, Sharing Results with Microsoft

GitHub announced that, beginning April 24, it will collect usage data from free, Pro and Pro+ Copilot users to train its own AI models and share the data with Microsoft. Business, Enterprise and users who opt out are exempt, sparking...

By Pulse
South Korea Launches $13 Million Data Space Pilot Program to Accelerate Secure Data Sharing
NewsMar 28, 2026

South Korea Launches $13 Million Data Space Pilot Program to Accelerate Secure Data Sharing

South Korea's Ministry of Science and ICT and the National Information Society Agency announced a call for Data Space pilot projects, pledging up to 16.8 billion won (about $13 million) for a medical initiative and additional funding for general‑field pilots. The move...

By Pulse
USPS Movers Guide Site Draws Fire Over Dark Patterns and Data Practices
NewsMar 28, 2026

USPS Movers Guide Site Draws Fire Over Dark Patterns and Data Practices

The United States Postal Service’s Movers Guide website, run by private contractor MyMove, was slammed for deceptive “dark‑pattern” design and unclear data handling after a user‑experience researcher filed a complaint with the USPS Inspector General. The criticism revives scrutiny of...

By Pulse
EU Customs Union Overhaul Targets €90 Bn Modernisation, Boosts Trade Efficiency
NewsMar 27, 2026

EU Customs Union Overhaul Targets €90 Bn Modernisation, Boosts Trade Efficiency

EU finance minister Makis Keravnos and Trade Commissioner Maros Sefcovic announced a historic customs code reform worth €90 bn, creating a single data hub and new authority in Lille. The move seeks to streamline cross‑border trade, cut compliance costs and protect the single...

By Pulse
Study Flags Flattering Yet Harmful AI Chatbot Advice, Highlights Algorithmic Bias Risks
NewsMar 27, 2026

Study Flags Flattering Yet Harmful AI Chatbot Advice, Highlights Algorithmic Bias Risks

A recently released study reveals that AI chatbots frequently respond with overly flattering language that can lead users toward harmful advice. The findings raise urgent questions about algorithmic bias, data quality, and the governance of large language models in the...

By Pulse
Designing High-Concurrency Databricks Workloads Without Performance Degradation
NewsMar 27, 2026

Designing High-Concurrency Databricks Workloads Without Performance Degradation

Databricks’ high‑concurrency workloads can suffer performance loss when many jobs write to the same Delta tables. By optimizing table layout with partitions or liquid clustering, enabling row‑level concurrency, and automating file compaction, engineers maintain stable throughput. Disk caching and Delta’s...

By DZone – DevOps & CI/CD
Predictive Intelligence in Snowflake Accelerates Growth Signal Detection
SocialMar 27, 2026

Predictive Intelligence in Snowflake Accelerates Growth Signal Detection

Missed our webinar? See how Crunchbase’s predictive intelligence in @Snowflake helps teams use high-signal data to spot growth, funding, and acquisition signals earlier — and act faster. Get the recording. 🎥: https://t.co/iYm0Ow88gF https://t.co/pJk1MeZSf9

By Crunchbase News
Cubs' VDX.tv Partner Faces Scrutiny Over Deep Fan Data Collection
NewsMar 27, 2026

Cubs' VDX.tv Partner Faces Scrutiny Over Deep Fan Data Collection

The Chicago Cubs' partnership with VDX.tv, a sports streaming vendor, has come under fire for harvesting extensive fan data—including IP addresses, device identifiers, browsing behavior and location—through cookies that persist for up to 90 days. Privacy advocates warn the practice...

By Pulse
Palantir Wins £360K FCA Pilot, Boosting Its Government‑Sector Credibility
NewsMar 27, 2026

Palantir Wins £360K FCA Pilot, Boosting Its Government‑Sector Credibility

Palantir Technologies has secured a 12‑week pilot with the UK Financial Conduct Authority worth more than £30,000 a week—about £360,000 ($460,000) in total. The deal gives the data‑analytics firm access to flag fraud, money‑laundering and insider‑trading activity, prompting praise from...

By Pulse
Boston Children's Enhances Care with Clinical Intelligence Platform
NewsMar 27, 2026

Boston Children's Enhances Care with Clinical Intelligence Platform

Boston Children’s Hospital deployed Etiometry’s AI‑driven clinical intelligence platform to capture continuous high‑frequency physiologic data across its pediatric ICU. The system aggregates and visualizes signals in real time, giving clinicians a shared, longitudinal view of each patient’s trajectory. Early results...

By Healthcare IT News (HIMSS Media)
SAP Acquires Reltio to Boost AI‑ready Data Foundation
SocialMar 27, 2026

SAP Acquires Reltio to Boost AI‑ready Data Foundation

SAP to Acquire Reltio: Make SAP and Non-SAP Data AI-Ready - https://t.co/RBGqnJN8mq >> Congrats. A key move to bolster the data foundation in SAP BDC. MDM and out-of-the-box integration are critical for the se non dee needed in th Agentic...

By Holger Müller
The Data Engineering Revolution | Spark, AI, and What’s Coming Next
BlogMar 27, 2026

The Data Engineering Revolution | Spark, AI, and What’s Coming Next

The article outlines how Apache Spark has become the backbone of modern data engineering, driving real‑time analytics and large‑scale ETL workloads. It highlights the infusion of generative AI models into pipeline orchestration, enabling automated schema evolution and anomaly detection. Recent...

By Confessions of a Data Guy
Databricks Launches AI‑Driven Lakewatch SIEM, Promising Up to 80% Cost Cut
NewsMar 27, 2026

Databricks Launches AI‑Driven Lakewatch SIEM, Promising Up to 80% Cost Cut

Databricks has rolled out Lakewatch, an open‑agentic SIEM that leverages generative AI to automate threat detection and response. The company says the service can slash total cost of ownership by as much as 80% while keeping years of hot, queryable...

By Pulse
IR Impact Awards Spotlight Privacy‑First Attribution and Martech Integration
NewsMar 27, 2026

IR Impact Awards Spotlight Privacy‑First Attribution and Martech Integration

The IR Impact Awards in the United States showcased emerging best practices in marketing measurement, emphasizing privacy‑first attribution, tighter martech integration and AI‑enabled performance analytics. Executives highlighted the growing reliance on TCF‑compliant vendors and the need for unified reporting across...

By Pulse
Digital Communications Governance: AI in Action
NewsMar 27, 2026

Digital Communications Governance: AI in Action

Artificial intelligence is now integral to Digital Communications Governance and Archiving (DCGA) in financial services, automating the monitoring, summarising, and risk detection of employee communications across text, voice, video and AI‑generated content. Theta Lake showcases six real‑world use cases, from...

By Fintech Global
Reveal Brings Conversational AI Analytics Directly Into Enterprise Applications
NewsMar 27, 2026

Reveal Brings Conversational AI Analytics Directly Into Enterprise Applications

Reveal, Infragistics' embedded analytics platform, now lets enterprises embed conversational AI analytics directly into their applications. The solution transforms static dashboards into interactive, question‑answer experiences while enforcing existing data permissions. It also offers token‑based cost controls, giving software teams visibility...

By AiThority » Sales Enablement
EU Launches Open‑Source ReLIFE Platform to Accelerate Deep Home Renovations
NewsMar 27, 2026

EU Launches Open‑Source ReLIFE Platform to Accelerate Deep Home Renovations

The European Climate, Infrastructure and Environment Executive Agency (CINEA) rolled out the open‑source ReLIFE platform during a 26 March 2026 online workshop, showcasing a digital ecosystem that makes building data actionable for deep residential renovations. The launch targets policymakers, financiers, owners...

By Pulse
Rollback Mistakes Instantly with Data Lake Time Travel
SocialMar 26, 2026

Rollback Mistakes Instantly with Data Lake Time Travel

Accidentally deleted something? Roll back. Time travel in data lake table formats enables versioning of big data. Access any historical version through timestamps or version numbers. https://www.ssp.sh/brain/time-travel

By SSP Data
Veritone Leans Into Oracle Cloud to Scale AI Data Pipelines
NewsMar 26, 2026

Veritone Leans Into Oracle Cloud to Scale AI Data Pipelines

Veritone announced a multi‑year agreement to migrate its core AI workloads, including aiWARE, Data Refinery, and Data Marketplace, to Oracle Cloud Infrastructure. The move aims to boost performance, security, and global scalability as the company tackles massive unstructured data volumes....

By Data Center Knowledge
Telstra to Add Flink to Its Event Streaming Capabilities
NewsMar 26, 2026

Telstra to Add Flink to Its Event Streaming Capabilities

Telstra announced it will integrate the Apache Flink stream‑processing engine with its existing Kafka‑based event streaming platform, launching the project in the coming months. The pairing, delivered through Confluent’s managed services, aims to boost real‑time analytics across Telstra’s network observability...

By iTnews (Australia) – Government
Arm Launches AGI CPU Amid Meta and OpenAI Compute Crunch
NewsMar 26, 2026

Arm Launches AGI CPU Amid Meta and OpenAI Compute Crunch

Arm announced its AGI CPU, a processor built for AI workloads, after Meta and OpenAI pressed the company for a more energy‑efficient solution. The chip is positioned to tap a $1.5 trillion market and generate $15 billion in revenue by fiscal 2031,...

By Pulse
Instant Unlimited Insights Free Teams From Dashboard Limits
SocialMar 26, 2026

Instant Unlimited Insights Free Teams From Dashboard Limits

We have entered the INFINITE UI ERA. Statlas MCP + Canon + Prophit Engineer = Endless Customization of Beautiful Personalized Reporting When you organize data effectively and combine it with Ai access you can generate any insight and visualization at warp speed. Problems...

By Taylor Holiday
Data, Not Apps, Is the Real Competitive Moat
SocialMar 26, 2026

Data, Not Apps, Is the Real Competitive Moat

The observation that data becomes the moat while applications become the commodity feels right. Companies that still think their competitive advantage is their software stack rather than their data architecture may be solving the wrong problem. #AI https://t.co/YVEyjd2R1Y

By Jon Warner
TACC Launches CFDE Cloud Workspace for NIH Common Fund Datasets
BlogMar 26, 2026

TACC Launches CFDE Cloud Workspace for NIH Common Fund Datasets

The Texas Advanced Computing Center (TACC) has publicly launched the Common Fund Data Ecosystem (CFDE) Cloud Workspace, a collaborative effort with Johns Hopkins, Penn State and the San Diego Supercomputer Center’s CloudBank. The platform gives researchers instant, no‑cost access to...

By HPCwire