Know What's Happening in Big Data

Today's Big Data Pulse

Leadership Gaps Hamper Data Engineering Teams, Survey Finds

Three 2026 surveys of 1,629 data professionals reveal organizational issues now dominate data‑engineering bottlenecks. In January, weak leadership direction and poor requirements accounted for 40% of top‑bottleneck votes, while by April 50% cited lack of clear ownership as the biggest pain point. Legacy systems and tooling were far lower priorities, at 25% and under 5% respectively.

Natwest Group CEO Touts Near-Term Agentic AI Workflow Future
NewsFeb 16, 2026

Natwest Group CEO Touts Near-Term Agentic AI Workflow Future

NatWest Group CEO Paul Thwaite announced that the bank is shifting from basic chatbots to autonomous AI systems capable of executing complex banking workflows for customers. He expects many of the underlying components to be operational within the year. However,...

By The Stack (TheStack.technology)
AMD Takes on Nvidia in India With Expanded Tata AI Partnership
NewsFeb 16, 2026

AMD Takes on Nvidia in India With Expanded Tata AI Partnership

Advanced Micro Devices (AMD) announced a partnership with Tata Consultancy Services (TCS) to bring its Helios AI data‑center blueprint to India. The collaboration aims to deploy up to 200 megawatts of AI‑infrastructure capacity, positioning AMD against Nvidia in the rapidly expanding...

By Bloomberg – Technology
Qatar Advances Sovereign Cloud Strategy to Strengthen Digital Trust and National Autonomy
NewsFeb 16, 2026

Qatar Advances Sovereign Cloud Strategy to Strengthen Digital Trust and National Autonomy

Qatar is accelerating a sovereign cloud strategy to keep sensitive data under domestic law, leveraging its Personal Data Privacy Protection Law as a regulatory backbone. Deloitte’s Cloud Centre of Excellence in Lusail is driving the effort, having migrated over 3,000...

By Computer Weekly – Latest IT news
Snowflake's Micro-Partitions Promote Lazy Modeling, Undermine Optimization
SocialFeb 15, 2026

Snowflake's Micro-Partitions Promote Lazy Modeling, Undermine Optimization

eczachly I hate Snowflake micro partitions and optimizations for a few reasons - they make data modeling lazy If you don’t have to understand the partitioning or shape of your data. You can just slap the data into Snowflake and call it a...

By Zach Wilson
Europe Is Coming After Infinite Scroll – TikTok's Endless Feed Is Now a Legal Problem
NewsFeb 15, 2026

Europe Is Coming After Infinite Scroll – TikTok's Endless Feed Is Now a Legal Problem

The European Commission has formally accused TikTok of designing its endless‑scroll feed to be addictive, especially for minors, and is treating this as a systemic risk under the Digital Services Act. The preliminary ruling targets infinite scroll, algorithmic recommendations and...

By TechSpot
Three Red Flags of Non‑Idempotent Data Pipelines
SocialFeb 15, 2026

Three Red Flags of Non‑Idempotent Data Pipelines

From Zach Wilson, three signs your pipeline isn't idempotent: 1. It uses INSERT INTO instead of INSERT OVERWRITE or MERGE 2. Date filters have "date > start" but no "date < end" - this causes exponential backfill costs 3. Source tables are always...

By SSP Data
Holographic Tape Inches Closer to Mass Market Ahead of Silica, Ceramic Media - 200TB WORM Tech Set to Debut in...
NewsFeb 14, 2026

Holographic Tape Inches Closer to Mass Market Ahead of Silica, Ceramic Media - 200TB WORM Tech Set to Debut in...

HoloMem, a UK startup, successfully ran its holographic tape system alongside traditional LTO drives inside a live LTO library, proving plug‑and‑play compatibility with existing data‑center hardware. The polymer‑ribbon cartridges are sized like standard LTO tapes and can store up to...

By TechRadar Pro
Analytics: The Easiest Gateway Into Tech Careers
SocialFeb 14, 2026

Analytics: The Easiest Gateway Into Tech Careers

As a CS girlie, I started my journey in Analytics and to this day, I still believe it has one of the lowest barriers to entry into a career in tech. The barrier has never been lower. If you’re thinking about...

By Ebere Oyek (Nelo) — Data | AI | ML
ISL Replaces ETL: Intelligence Beats Raw Data Ingestion
SocialFeb 14, 2026

ISL Replaces ETL: Intelligence Beats Raw Data Ingestion

Ingest Structure Learn (ISL) is the new ETL. It used to be the case that a company would try to license this kind of data as an “edge”. I’ve seen many companies in SV try to make this claim. That...

By Chamath Palihapitiya
The $800B Open Secret: What the New Medicaid Spending Dataset Means for Health Tech Builders and Investors
BlogFeb 14, 2026

The $800B Open Secret: What the New Medicaid Spending Dataset Means for Health Tech Builders and Investors

The episode breaks down the release of the largest publicly available Medicaid claims dataset, detailing its composition, gaps, and immediate utility for health‑tech builders and investors. It quantifies the scale of Medicaid spending (~$849 B) and improper payments (over $30 B annually),...

By Thoughts on Healthcare Markets & Tech
WrenAI Automates BI with AI-Powered Text2SQL
SocialFeb 14, 2026

WrenAI Automates BI with AI-Powered Text2SQL

Move over Tableau and PowerBI. There's a new Python library that automates Business Intelligence with AI using Text2SQL. Let me introduce you to WrenAI:

By Matt Dancho
Beyond Accuracy: Build Actionable AI Models and Agents
SocialFeb 13, 2026

Beyond Accuracy: Build Actionable AI Models and Agents

Most portfolios fail because they stop at “model accuracy.” A good AI/DS portfolio has: 1. A model that predicts something the business can act on 2. An AI agent that turns outputs into next steps It's that simple. Want help?

By Matt Dancho
[In-Depth Guide] The Complete CTGAN + SDV Pipeline for High-Fidelity Synthetic Data
NewsFeb 13, 2026

[In-Depth Guide] The Complete CTGAN + SDV Pipeline for High-Fidelity Synthetic Data

The article walks through a production‑grade synthetic data pipeline that combines CTGAN with the SDV ecosystem, starting from raw mixed‑type tables and ending with model serialization. It demonstrates how to attach metadata, enforce numeric and categorical constraints, and perform conditional...

By MarkTechPost
AI Agents Speed Up Database Benchmarking for Patch Reviews
SocialFeb 13, 2026

AI Agents Speed Up Database Benchmarking for Patch Reviews

I discovered a new favorite use of AI Agents. Get ☕️, its a bit long: If you follow the postgres-hackers list, you know this pattern: - Someone submits a patch - Someone else raises performance concerns with the patch The rational thing to do...

By Gwen (Chen) Shapira
Akron Children's Uses Epic and Real-Time Analytics to Reduce Waste Anesthesia Gases
NewsFeb 13, 2026

Akron Children's Uses Epic and Real-Time Analytics to Reduce Waste Anesthesia Gases

Akron Children’s Hospital leveraged its Epic EHR and real‑time analytics to dramatically cut waste anesthesia gases, a source of 5‑10% of its greenhouse‑gas emissions. By introducing low‑flow reminders in Epic and on anesthesia machines, the team achieved an initial 5%...

By Healthcare IT News (HIMSS Media)
Unlocking Your Retail Insights with LLMs
NewsFeb 13, 2026

Unlocking Your Retail Insights with LLMs

Best Buy is leveraging large language models to clean and enrich messy retail data, turning unstructured customer signals into actionable insights. The article stresses that LLM adoption must start with a clear business case rather than hype, especially for tasks like...

By AI Accelerator Institute
Data Engineering: Experience Beats Tutorials Through Pattern Recognition
SocialFeb 13, 2026

Data Engineering: Experience Beats Tutorials Through Pattern Recognition

After years in data engineering, I've realized the job is mostly pattern recognition. You see a problem. You recognize it as a variant of a problem you've solved before. You apply a known solution with modifications. This is why experience matters more...

By SSP Data
Migrating to Databricks – A Guide
BlogFeb 13, 2026

Migrating to Databricks – A Guide

The guide cautions that moving to Databricks won’t fix weak data fundamentals; organizations must first establish clear dev‑prod separation, version‑controlled code, and cost accountability. It urges teams to define real needs, avoid over‑architecting, and split infrastructure choices from data‑architecture decisions....

By Confessions of a Data Guy
Great Tables Turns DataFrames Into Presentation‑ready Tables
SocialFeb 13, 2026

Great Tables Turns DataFrames Into Presentation‑ready Tables

Turning a DataFrame into a presentation-ready table in Python. Recently I tried a library called Great Tables and it makes formatting tables very easy. - Works with Pandas & Polars - 19 formatting methods (currency, percentages, dates, scientific notation) - Export to HTML, LaTeX,...

By Karina | Python | Excel | Stats | DataScience | DataAnalytics
Unified Data Turns Branches Into Profit Engines
SocialFeb 13, 2026

Unified Data Turns Branches Into Profit Engines

Branches can become profit engines, not cost centers, if supported by unified data and modern infrastructure. Execution is the difference. We discuss this with Benjamin Conant of @alkamitech and Co founder of @Mantl_tech. Watch the full video now: https://t.co/Jamreeb077 https://t.co/L1lau9ecEU

By Jim Marous
Project Seeks to Bring Data Analytics to ‘Analogue’ Football Policing
NewsFeb 13, 2026

Project Seeks to Bring Data Analytics to ‘Analogue’ Football Policing

The Police Digital Service (PDS) has signed a six‑month, £600,000 contract with data‑analytics specialist Bays Consulting to pilot data‑driven planning for football match policing. The initiative seeks to replace traditional analogue risk‑assessment matrices with crowd‑modelling and predictive analytics, aiming for...

By PublicTechnology.net (UK)
From Probabilistic to Proven: The Deterministic Turn in Audience Data Strategy
NewsFeb 13, 2026

From Probabilistic to Proven: The Deterministic Turn in Audience Data Strategy

TV advertising is shifting from probabilistic to deterministic audience data. Recent studies show IP‑based identity links to the correct household only 13% of the time, undermining reach and measurement. Deterministic signals such as authenticated ISP or publisher subscriber data can...

By Streaming Media
Cloudera Enables Faster, Accurate AI and Analytics with Unified Data Access Capabilities
NewsFeb 12, 2026

Cloudera Enables Faster, Accurate AI and Analytics with Unified Data Access Capabilities

Cloudera announced that its AI Inference and Data Warehouse with Trino are now available for on‑premises deployment. The AI Inference service leverages NVIDIA’s Blackwell GPUs, Dynamo‑Triton server and NIM micro‑services to run LLMs, computer‑vision and other models inside customer data...

By Database Trends & Applications (DBTA)
UK Customers Aren't as Worried About Sovereignty as EU, Cisco Exec Says
NewsFeb 12, 2026

UK Customers Aren't as Worried About Sovereignty as EU, Cisco Exec Says

Cisco’s EMEA president Gordon Thomson told The Stack that British companies are less preoccupied with data‑sovereignty than their European counterparts. He noted that infrastructure autonomy has become a board‑level fear across the region, while AI localisation requirements are muddying the...

By The Stack (TheStack.technology)
Amazon’s Send to Alexa Plus Makes the Kindle Scribe Feel More Like a Productivity Device
NewsFeb 12, 2026

Amazon’s Send to Alexa Plus Makes the Kindle Scribe Feel More Like a Productivity Device

Amazon introduced Send to Alexa Plus, a new feature for Kindle Scribe and Scribe Colorsoft that lets users push handwritten notes or PDFs to Alexa’s AI assistant. Alexa can summarize content, generate to‑do lists, calendar events, reminders, and even draft...

By The Verge
AI PoC to Production: A Practical Guide to Scaling Artificial Intelligence in the Enterprise
NewsFeb 12, 2026

AI PoC to Production: A Practical Guide to Scaling Artificial Intelligence in the Enterprise

Enterprises often excel at AI proofs‑of‑concept but stumble when scaling to production, where reliability, governance, and measurable ROI are mandatory. The guide outlines a seven‑step framework—starting with early success criteria, strengthening data pipelines, building cloud‑native infrastructure, adopting MLOps, enforcing governance,...

By Datafloq
Xinnor's Alternative Software RAID Filer for AI
NewsFeb 12, 2026

Xinnor's Alternative Software RAID Filer for AI

Software RAID vendor Xinnor unveiled xiNAS, an all‑flash NAS filer built on its xiRAID stack, XFS, and NFS over RDMA, targeting AI, HPC and data‑intensive workloads. In a Supermicro validation, a single node achieved up to 74.5 GB/s sequential read and...

By Blocks & Files
India at the Digital Turning Point: How Virtual Twins, AI and Data Are Rewiring Industry
NewsFeb 12, 2026

India at the Digital Turning Point: How Virtual Twins, AI and Data Are Rewiring Industry

The final India Leadership Talks episode highlighted how virtual‑twin technology and model‑based engineering are moving from promise to practice across manufacturing, infrastructure and life sciences. Leaders from Godrej, KPMG and IndianOil Adani Ventures described a shift from basic digitalisation to...

By ET EnergyWorld (The Economic Times)
Does Your TV Track You Even Through the HDMI Port? Short Answer: Yes
NewsFeb 12, 2026

Does Your TV Track You Even Through the HDMI Port? Short Answer: Yes

Smart TVs can monitor content played on HDMI‑connected devices using two methods: HDMI‑CEC metadata and Automatic Content Recognition (ACR). ACR takes pixel‑level snapshots to fingerprint shows, movies, or games, while CEC logs device IDs and usage duration. The article outlines...

By ZDNet – Big Data
Why Declarative (Lakeflow) Pipelines Are the Future of Spark
BlogFeb 11, 2026

Why Declarative (Lakeflow) Pipelines Are the Future of Spark

Spark is evolving from low‑level RDD and notebook‑driven workflows to declarative pipelines, branded as Lakeflow on Databricks. The new framework lets engineers define flows, datasets, and pipelines in a configuration‑first manner, while Spark handles execution for both batch and streaming....

By Confessions of a Data Guy
Robin Moffatt on the Evolution of Data Engineering: From Batch Jobs to Real-Time | Podcast Interview
BlogFeb 11, 2026

Robin Moffatt on the Evolution of Data Engineering: From Batch Jobs to Real-Time | Podcast Interview

Robin Moffatt discusses how data engineering has shifted from traditional batch processing to real‑time streaming in a recent podcast interview. He outlines the technical drivers—cloud scalability, event‑driven architectures, and low‑latency analytics—that enable continuous data pipelines. Moffatt also highlights emerging tools...

By Confessions of a Data Guy
Mayo Clinic Platform Standardizes Cancer Data to Speed Up Trials
NewsFeb 11, 2026

Mayo Clinic Platform Standardizes Cancer Data to Speed Up Trials

Mayo Clinic Platform’s Orchestrate tool has added new capabilities that deliver standardized, research‑ready cancer data. The upgrade leverages the OMOP Oncology common data model to transform unstructured inputs such as pathology reports and imaging into consistent tumor characteristics, biomarkers, and...

By HIT Consultant
Data Pipeline Design Playbook 2026
NewsFeb 11, 2026

Data Pipeline Design Playbook 2026

The 2026 Data Pipeline Design Playbook positions pipeline architecture as the decisive factor separating data‑driven firms from laggards. It outlines seven modern frameworks—including the kappa shift, ELT over ETL, medallion data lakes, microservice pipelines, and lambda balancing—to achieve real‑time consistency,...

By AI Accelerator Institute
Driving Safer AVs Faster with Smart Simulation, Neural Reconstruction, and Data-Centric Tools - Ep. 289
PodcastFeb 11, 202645 min

Driving Safer AVs Faster with Smart Simulation, Neural Reconstruction, and Data-Centric Tools - Ep. 289

In this episode, Rohan Bhasin of Fortellix and Dan Gural of Voxel51 discuss how autonomous‑vehicle (AV) teams can transform massive drive‑log datasets into high‑fidelity simulations using neural reconstruction, scenario‑driven data curation, and NVIDIA‑accelerated pipelines. They explain how these tools enable...

By The AI Podcast (NVIDIA)
Get the Rundown on Data Engineering Trends for 2026 with Informatica, lakeFS, and Aerospike
NewsFeb 11, 2026

Get the Rundown on Data Engineering Trends for 2026 with Informatica, lakeFS, and Aerospike

Data engineering in 2026 is shifting from batch warehouses to real‑time, cloud‑native ecosystems that feed AI and generative models. Leaders like Informatica, lakeFS and Aerospike stress that active data, automated governance, and AI‑driven predictive scaling are essential to avoid bottlenecks....

By Database Trends & Applications (DBTA)
Versioning and Testing Data Solutions: Applying CI and Unit Tests on Interview-Style Queries
BlogFeb 11, 2026

Versioning and Testing Data Solutions: Applying CI and Unit Tests on Interview-Style Queries

The article walks through solving a Tesla interview question in Python, calculating each car maker’s net product launch change between 2019 and 2020 using pandas. It then refactors the script into a reusable function and adds a unit‑test suite to...

By KDnuggets
5 XGBoost Hacks From a Kaggle Grandmaster
SocialFeb 11, 2026

5 XGBoost Hacks From a Kaggle Grandmaster

XGBoost Tips from 5x Kaggle Grandmaster Chris Deotte Top 5 ways to improve your ML models:

By Matt Dancho
Re-Air: Data Teams at the Crossroads: Proving Value in a Changing Business Landscape with Ben Rogojan
PodcastFeb 11, 202652 min

Re-Air: Data Teams at the Crossroads: Proving Value in a Changing Business Landscape with Ben Rogojan

In this re‑aired episode, John interviews Ben Rogojan, owner of Seattle Data Guy, about how data teams can demonstrate value amid tighter budgets and rapid AI advances. They discuss shifting from output‑focused metrics like dashboards to outcome‑driven results, the importance...

By The Data Stack Show
Sponsored: Factory-First: How Modular Construction Becomes the only Scalable Path for the Next Era of Data Centers
NewsFeb 11, 2026

Sponsored: Factory-First: How Modular Construction Becomes the only Scalable Path for the Next Era of Data Centers

The data‑center sector is racing to deliver gigawatt‑scale campuses amid soaring AI demand, tight labor markets, and long equipment lead times. Traditional on‑site construction cannot keep pace, prompting a shift toward factory‑first modular building. By standardizing designs and producing electrical,...

By Data Center Dynamics
AI Agents Boost Platform Growth by Simplifying Data and Code
SocialFeb 11, 2026

AI Agents Boost Platform Growth by Simplifying Data and Code

When thinking through the future of software, it’s helpful to think through what will we produce more of vs. less of in the future due to agents. And what systems are tied to that production or consumption. Whether it’s a new...

By Aaron Levie
China’s Top Chipmaker Warns Rushed AI Capacity Could Sit Idle
NewsFeb 11, 2026

China’s Top Chipmaker Warns Rushed AI Capacity Could Sit Idle

China’s leading semiconductor manufacturer SMIC warned that a rush to purchase AI chips is prompting companies to build a decade’s worth of data‑center capacity in just one or two years. CEO Zhao Haijun said the rapid build‑out is outpacing clear...

By Bloomberg – Technology
Iceberg's New API Validates DuckDB's Catalog Metadata Claim
SocialFeb 11, 2026

Iceberg's New API Validates DuckDB's Catalog Metadata Claim

The fact that iceberg has introduced a scan planning api is a tacit admission that ducklake is right and the metadata should just live in the catalog. https://t.co/PgsOWxx1v0

By George Fraser
ESMA’s Digital and Data Strategies Support Supervision of EU Financial Markets
NewsFeb 11, 2026

ESMA’s Digital and Data Strategies Support Supervision of EU Financial Markets

The European Securities and Markets Authority (ESMA) has launched a new Digital Strategy for 2026‑2028 and refreshed its Data Strategy covering 2023‑2028. Both roadmaps aim to accelerate digital transformation, simplify supervisory reporting and harness data‑driven insights across the European System...

By ESMA – Press
Teradata Tops Expectations on Public Cloud Momentum and Its Stock Surges
NewsFeb 11, 2026

Teradata Tops Expectations on Public Cloud Momentum and Its Stock Surges

Teradata reported fourth‑quarter earnings of $0.74 per share, well above the $0.54 consensus, and revenue of $421 million, a 3% year‑over‑year increase. Recurring revenue now represents 87% of total sales, while public‑cloud annual recurring revenue jumped 15% to $701 million. The company...

By SiliconANGLE – Big Data
Jack Ma-Backed Ant Bets on AI Health in $69 Billion Sector Race
NewsFeb 10, 2026

Jack Ma-Backed Ant Bets on AI Health in $69 Billion Sector Race

Ant Group, the Jack Ma‑backed fintech giant, is shifting its growth engine from digital payments to artificial‑intelligence‑powered health care. After a stalled IPO five years ago, the company has become one of China’s largest investors in medical AI, funding platforms...

By Bloomberg – Technology
How to Design Complex Deep Learning Tensor Pipelines Using Einops with Vision, Attention, and Multimodal Examples
NewsFeb 10, 2026

How to Design Complex Deep Learning Tensor Pipelines Using Einops with Vision, Attention, and Multimodal Examples

The MarkTechPost tutorial showcases how Einops can express complex tensor transformations for deep‑learning pipelines with concise, readable syntax. It walks through real‑world patterns such as vision patchification, multi‑head attention, and multimodal token packing, demonstrating each operation using rearrange, reduce, repeat,...

By MarkTechPost
AI Can Predict Your Future Salary Based on Your Photo, Boffins Claim
NewsFeb 10, 2026

AI Can Predict Your Future Salary Based on Your Photo, Boffins Claim

Researchers applied an AI model to LinkedIn photos of over 96,000 MBA graduates, extracting Big Five personality traits and showing they predict program rank, initial compensation, salary trajectory, and job transitions. The algorithm builds on a 2020 study that has...

By The Register – AI/ML (data-related)
Best Tools for Test Data Management to Accelerate QA Teams in 2026
NewsFeb 10, 2026

Best Tools for Test Data Management to Accelerate QA Teams in 2026

Test Data Management (TDM) tools are becoming essential for QA and DevOps teams as CI/CD pipelines demand rapid, compliant data provisioning. In 2026, vendors such as K2view, Delphix, Datprof, IBM Optim, Informatica, and Broadcom lead the market, each emphasizing self‑service,...

By HackRead
Wesco International Pushes Digital Overhaul Amid Q4 Sales Growth
NewsFeb 10, 2026

Wesco International Pushes Digital Overhaul Amid Q4 Sales Growth

Wesco International closed 2025 with record $23.5 billion in sales, an 8% increase year‑over‑year, and a 10% jump in Q4 revenue. The distributor invested more than $35 million in a unified data lake and AI‑driven tools to replace legacy systems across its...

By Digital Commerce 360