Today's Big Data Pulse

Data‑Engineering Bottlenecks Shift From Legacy Tech to Leadership Gaps
Three 2026 surveys of 1,629 data professionals show that weak leadership direction and poor requirements now account for 40% of top‑bottleneck votes, outpacing legacy systems at 25%. By April, 50% of respondents cite lack of clear ownership as the biggest pain point, while better tooling is mentioned by under 5%.
Also developing:
By the numbers: Ampere Analysis acquires PlumResearch

Unlocking the Power of Public Sector Data by Overcoming Common Strategy Pitfalls
Public sector organisations view data as a strategic asset, yet many treat data strategy as a one‑off document that quickly becomes obsolete. The article outlines common pitfalls—treating strategy as paperwork, ignoring people and culture, lacking clear purpose, and failing to maintain an ongoing process. It then proposes a holistic, continuous approach that blends people, process, technology, and governance to turn data initiatives into tangible citizen outcomes. Made Tech offers expertise to assess maturity, align goals, and embed a living data program across government agencies.

MARS Coalition Advocates for Data-Driven Road Safety in the US
The Modern Analytics for Roadway Safety (MARS) Coalition is urging Congress to modernise federal road safety programs by adopting AI, telematics and predictive analytics. These technologies allow agencies to spot crash risks before they materialise, moving from reactive to preventive...

Capgemini Joins OpenAI's Frontier Alliance to Scale Enterprise AI
Capgemini has joined OpenAI’s newly launched Frontier Alliance as a founding partner, creating a dedicated delivery function to scale AI agents for enterprises. The firm will deploy OpenAI‑certified professionals to tackle data readiness, integration, operating‑model design and governance challenges. Capgemini...

Find Duplicate Rows in SQL Server with a CTE
The article shows how to locate and list duplicate rows in a SQL Server table using a Common Table Expression (CTE) that groups all columns and counts occurrences. It presents two queries: one that returns only unique rows (order_count = 1) and...
Lætitia AVROT: Mostly Dead Is Slightly Alive: Killing Zombie Sessions
PostgreSQL administrators frequently encounter zombie sessions—backend processes that remain active or idle in transaction after a client vanishes. Linux’s default TCP keepalive interval of two hours lets these dead connections retain locks and block vacuum, inflating the process list. The...

South Korea, Australia, Portugal Top OECD Digital Government Index for 2025
The OECD’s 2025 Digital Government Index (DGI) places South Korea at the top with a 0.95 composite score, followed by Australia (0.88) and Portugal (0.86). Korea is the only nation to break the 0.9 threshold across all six assessment categories,...

Gartner Acknowledges Growth of Decision Intelligence Platforms with Inaugural Magic Quadrant
Gartner released its inaugural Magic Quadrant for Decision Intelligence Platforms, signaling a shift from data‑driven to decision‑centric strategies. The report highlights legacy players like FICO alongside newer pro‑code solutions such as Quantexa, and notes that generative AI integration remains early....
NDAP Overhaul in Works to Handle Surge in Big Data
India’s National Data and Analytics Platform (NDAP) will undergo a major revamp as NITI Aayog seeks a private‑sector partner to redesign, operate and hand over the system. The upgrade aims to handle vastly larger data volumes, add advanced analytics and...

The Rise of Location Intelligence: Turning Geographic Data Into Competitive Advantage
Location intelligence is moving from a background reporting tool to a strategic asset as businesses combine geographic data with operational metrics. By layering spatial context onto demand, infrastructure and behavior datasets, firms uncover patterns that traditional analytics miss. AI and...
How Data Analytics Is Transforming Modern Risk Assessment
Data analytics is reshaping risk assessment from a reactive practice into a predictive science across finance, insurance, healthcare, and transportation. Predictive modeling, machine‑learning, and real‑time dashboards now enable firms to forecast exposure, micro‑segment customers, and allocate capital with greater confidence....

Macquarie Partners with KINX & Gabia for South Korean Data Center Build-Out
Macquarie Asset Management’s Asia‑Pacific Infrastructure Fund 4 has teamed with South Korean IT firm Gabia and its network subsidiary KINX to launch a $420 million hyperscale data‑center venture. The joint‑venture will initially build a 40 MW facility in Ansan, Seoul, and aims to...

Google Files for Fifth Data Center at Midlothian Campus in Texas
Google, via shell company Sharka LLC, filed to build a fifth data center on its Midlothian, Texas campus. The $880 million project will span 288,000 sq ft and is slated for completion by February 24, 2027. This addition follows a $100 million fourth building announced in...

Tonic Structural vs Informatica: Which Is Better for Test Data Management?
The article compares Tonic Structural and Informatica for test data management, highlighting that both generate privacy‑safe data but differ in deployment models and feature focus. Informatica is shifting to a cloud‑first strategy after its Salesforce acquisition, limiting on‑premises options, while...

Coforge Advances Data Cosmos, a Next-Gen AI-Enabled, Cloud-Native Data and Analytics Platform Designed to Accelerate Enterprise Transformation
Coforge has launched Data Cosmos, an AI‑enabled, cloud‑native data engineering and analytics platform designed to unify fragmented enterprise data. The solution is organized into five portfolios—Supernova, Nebula, Hypernova, Pulsar, and Quasar—that address modernization, governance, DataOps, and GenAI adoption across the...

NS&I’s Modernisation Programme: A £3bn Lesson in How to Lose Public Trust
The Public Accounts Committee has labeled the National Savings and Investments (NS&I) digital modernisation a “full‑spectrum disaster” after four years of a £3 bn programme that lacks an integrated plan, has seen costs triple and deadlines disappear. Parliament found the project...

Day 146: Time Series Database Integration - Turning Logs Into Queryable Metrics
Today's post highlights the shift from raw log files to queryable metrics using time‑series databases. It explains why traditional relational databases falter with high‑write, append‑only workloads and showcases InfluxDB and TimescaleDB as purpose‑built solutions. The article illustrates how these databases...

From Silos to Synergy: How Data Sharing Is Transforming Airports
The aviation sector is moving from isolated legacy systems to open‑architecture platforms that enable real‑time data sharing among air traffic control, airlines, and airports. Searidge Technologies, a NATS subsidiary, showcased its Chorus platform powering tools like Intelligent Stand Manager, which...
Third-Party AI Agents Can Now Plug Into LiveRamp’s Platform
LiveRamp announced that third‑party AI agents can now plug directly into its data collaboration platform, removing the need for custom API calls. The integration enables agents to automate audience planning, segmentation, measurement and to interact with partner and proprietary agents....

A Coding Guide to Build a Scalable End-to-End Analytics and Machine Learning Pipeline on Millions of Rows Using Vaex
The MarkTechPost tutorial walks through building a production‑style analytics and machine‑learning pipeline with Vaex on a synthetic 2 million‑row dataset. It showcases lazy feature engineering, approximate city‑level aggregations, and seamless integration with scikit‑learn via Vaex‑ML. The guide also demonstrates model training,...
Databricks RTM Beats Flink, No Batching Needed
#1 thing people don't know about Databricks and Apache Spark: the performance of Real-Time Mode (RTM), it's faster than Apache Flink and more robust. No more batching.
Do It Best Group’s New Retail Pulse Helps Retailers Turn Data Into Direction
Do it Best Group has launched Retail Pulse, a data‑driven platform that transforms independent hardware dealers’ POS and purchasing data into clear, actionable insights. By aggregating more than 1,000 member datasets, the tool creates tailored peer groups and highlights opportunities...

CSX Modernizes Data Management System
Infosys announced the completion of a large‑scale data modernization program for CSX Corporation, deploying its AI‑first Topaz platform built on Microsoft Fabric and Purview. The effort consolidated CSX’s fragmented data landscape into a unified cloud‑native environment, creating over 170 data...

Storage News Ticker – March 2
Snowflake expanded its Cortex Code CLI to run in local environments, enabling AI‑assisted coding across dbt, Apache Airflow and other non‑Snowflake data sources under a subscription model. London‑based Cristie Software introduced FSBlocker, a lightweight kernel driver that locks down files...

Updating Data Architecture for 2026 with Informatica, Dataiku, Qlik, and CData
The DBTA webinar highlighted that 85% of subscribers plan to modernize data platforms by 2025, driven by the rapid rise of GenAI and large language models. Vendors such as Informatica, Dataiku, Qlik and CData outlined a shift toward modular, AI‑driven...

AI to Transform How Credit Market Works, JPMorgan Banker Says
JPMorgan’s global head of credit trading, Sanjay Jhamna, says generative AI will overhaul credit trading by efficiently processing the asset class’s massive unstructured data. He described credit markets as the last frontier for automation, noting that conventional AI models have...

The Secret Life of Database Keys
The article demystifies database keys, contrasting natural keys—business‑meaning values—with surrogate keys that are system‑generated identifiers. It outlines why surrogates are favored for stability, compactness, and predictable performance, while also noting scenarios where natural keys or composite junction keys are preferable....

Yonyou Unveils the Large Ontology Model (LOM)
Yonyou released its Large Ontology Model (LOM) on February 24, a 4‑billion‑parameter AI that shifts enterprise data from static tables to a dynamic knowledge‑graph architecture. The model automates multi‑source ontology construction and delivers multi‑hop reasoning across procurement, production, sales and...

Boost Pandas Performance with Modin, Dask, Polars
Python Tip When pandas is too slow, there are other libraries to rescue: - Modin - Easiest switch from pandas Change one line: import modin.pandas as pd Same syntax. Uses all CPU cores - Dask - When data > RAM Processes data in chunks across CPU...

Druva Uses Graph Relationships to Mine Metadata
Druva has introduced Dru MetaGraph, a graph‑database layer that stores backup metadata as interconnected nodes, enabling AI agents to answer security and compliance questions with real‑time context. The approach stems from three drivers: security queries are fundamentally relationship‑based, customers need instant,...
Buyer’s Guide: Comparing the Leading Cloud Data Platforms
The buyer’s guide evaluates the five dominant cloud data platforms—Databricks, Snowflake, Amazon Redshift, Google BigQuery, and Microsoft Fabric—highlighting their architectures, AI integrations, deployment models, and pricing structures. Databricks champions the lakehouse model with generative AI and open formats, while Snowflake...

AWS UAE Suffers AZ Outage After "Objects Strike Data Center" And Cause Fire, Amid Iran Attacks
Amazon Web Services’ ME‑CENTRAL‑1 region in the United Arab Emirates experienced an Availability Zone outage after unidentified objects struck the data center, igniting a fire and prompting emergency power shutdown. The incident coincided with a wave of Iranian missile and...

How Big Data Is Changes How We Buy and Sell Real Estate
Big data is reshaping real estate by giving developers, agents, and investors real‑time demographic, economic, and environmental insights. Over 80 % of agents now use AI‑driven tools, and predictive analytics enable precise scenario modeling for pricing, density, and amenities. The technology...

Replace UNION ALL with GROUPING SETS for Faster Aggregations
Stop Writing UNION ALL for Multi-Level Aggregations You need regional totals AND product totals AND grand totals. So you write three separate queries with UNION ALL. There's a better way: GROUPING SETS. UNION ALL - Scans the table 3 times. Slow. GROUPING SETS - One...
Databricks Overtakes Snowflake
Databricks has overtaken Snowflake in quarterly revenue, now leading by $120 million after a $220 million gap two years ago. The shift is driven by AI’s demand for unstructured data, which Databricks processes directly from object storage without migration. Databricks SQL grew...
Purpose Drives Design: Functions of a Statewide Longitudinal Data System
Statewide longitudinal data systems (SLDS) can boost education and workforce outcomes, but designs vary based on intended functions—public reporting, research analytics, and individual support. The brief by Stefaan Verhulst explains how policymakers can align system architecture, governance, and legal frameworks...
Metadata‑Driven MRR Schedules Unlock Revenue Intelligence
As I was building my MRR analysis feature, I realized that there is much more power in our MRR schedule than we realize. With the correct metadata, we have a revenue intelligence engine that will provide more insight for our...
Introducing the Gwenchmarks Manifesto: Learn Benchmark Mastery
Folks asked me "what's your plan for gwenchmarks"? At first, it was a joke. But... teaching people how to plan, execute and read benchmarks is a good goal. So I wrote The Gwenchmarks Manifesto as a start. Still a bit...

New Databricks Offering Targets Next-Generation Data Streaming
Databricks launched Zerobus Ingest, a fully managed serverless streaming service that moves data directly into Delta Lake tables. The platform streams data from sources such as manufacturing systems, financial trading apps, IoT devices, and cybersecurity tools. It promises sub‑five‑second latency,...
Uncovering Hidden Fraud Networks
Entity resolution, knowledge graphs, and geospatial analytics together dismantle hidden fraud networks across government programs and insurance lines. By linking fragmented records—tax filings, social media, transaction logs—into unified 360‑degree profiles, investigators can spot duplicate registrations, synthetic identities, and collusive entities....
DuckDB Lets You Query 10GB Parquet Locally, Ditch Clusters
There's a moment in every data engineer's career when they discover they can query a 10GB Parquet file on their laptop in seconds. That's the DuckDB moment. It changes how you think about what requires a cluster and what doesn't. Spoiler: most...

Spanner Adds Iceberg Lakehouse Support with 200× Faster Scans
"The columnar engine uses a specialized storage mechanism designed to accelerate analytical queries by speeding up scans up to 200 times on live operational data" The new @googlecloud Spanner capability means you can serve Iceberg lakehouse data ... https://t.co/dxmgEAI0cA https://t.co/TUe0vNnzfN

QBO Cloud and MinIO Collaborate to Deliver Enterprise-Grade Object Storage for Modern AI and Analytics Workloads
QBO Cloud and MinIO announced a joint solution that merges QBO’s bare‑metal cloud platform with MinIO’s AIStor, an exascale, S3‑compatible object storage system. The partnership delivers a unified, high‑performance data layer designed for modern AI and analytics workloads, emphasizing scalability,...
Effective Data Strategy Needs Governance, Not Just Storage
A strong data strategy is more than storage. Its context, quality, & governance. The “useless” data may hold insights GenAI needs, but without curation, access controls, and trust, innovation risks becoming noise instead of value. https://t.co/ParkENiwRg
Unified Intelligence: Mastering the Azure Databricks and Azure Machine Learning Integration
The article outlines how Azure Databricks and Azure Machine Learning can be tightly integrated to create a unified intelligence pipeline. Databricks handles large‑scale data ingestion, cleaning, and feature engineering using Spark and Delta Lake, while Azure ML supplies model versioning,...

Data Readiness – Why, and How, Your Data Will Make or Break AI Success in 2026
Legal technology leaders are hosting a March 18 webinar to dissect "data readiness" as the decisive factor for AI success in the legal sector by 2026. They argue that fragmented repositories, inconsistent metadata, and weak governance are the primary obstacles...
Vibhor Kumar: Open Source, Open Nerves
At last year’s CIO Summit in Mumbai, senior leaders from banking, fintech, telecom and manufacturing debated the growing risk profile of open‑source databases, with PostgreSQL emerging as the focal point. The conversation has moved from pure performance to trust, encompassing...

Emerald Intel Launches Embedded Analytics, Delivering a Real-Time Macro View of the Cannabis Industry
Emerald Intelligence has introduced Embedded Analytics, a new SaaS feature that provides real‑time, macro‑level dashboards for the licensed cannabis and hemp market. The initial release includes four interactive dashboards covering state sales, company leaderboards, product sales, and store status, all...

Avoid Common Mistakes in B2B Data Appending: An Executive Guide
Accurate B2B data appending is a strategic lever that drives sales and marketing performance. Companies that rely on internal teams often face technical, resource, and compliance hurdles, leading to stale or incomplete records. Partnering with specialized data‑append providers delivers fresh,...

5 Ways to Make Trusted Data the Backbone of Your Sustainable Supply Chains
Companies face mounting sustainability regulations and consumer scrutiny, yet their legacy supply‑chain systems hold fragmented, inconsistent product data. The article outlines five actions—gaining product visibility, feeding tools with clean inputs, extending traceability beyond distribution, building compliance‑ready data infrastructure, and treating...

MDS and DFRS Cooperate to Drive Vision Zero
Germany’s Mobility Data Space (MDS) and the pan‑European Data for Road Safety (DFRS) consortium have signed an agreement to exchange safety‑related traffic data from connected vehicles across the EU. The partnership enables near‑real‑time sharing of sensor‑derived incident information, supporting the...