Big Data Podcasts: Trending Episodes & Updates

Podcast•Apr 3, 2026•0 min

Parquet Fundamentals in 3 Mins

The episode explains how Apache Parquet’s hybrid columnar‑row format optimizes storage and query performance for large datasets. It contrasts row‑wise and pure columnar layouts, highlighting the inefficiencies of each, and then describes Parquet’s structure of row groups, column chunks, and pages, along with its self‑describing metadata that enables column pruning and efficient reads. The host also notes Parquet’s origins at Twitter and Cloudera and points listeners to a newsletter for deeper data‑engineering content.

By VuTrinh (Substack)

Podcast•Apr 1, 2026•37 min

Corewell Health’s Jarve Says Population Health Data Challenges Demand Internal Builds

In this episode, Dr. Bob Jarvie, Associate CMIO and Medical Director for Population Health Analytics at Corewell Health, explains why the health system built its own internal population health data platform instead of relying on external vendors. He highlights the...

By healthsystemCIO

Podcast•Mar 30, 2026•49 min

#353 The Data Team's Agentic Future with Ketan Karkhanis, CEO at ThoughtSpot

In this episode, ThoughtSpot CEO Ketan Karkhanis discusses how AI agents are reshaping data analytics, turning self‑service BI from a long‑standing promise into a reality. He showcases ThoughtSpot’s agents—Spotter, Spotter Model, and SpotterWiz—that can answer business questions, automate data engineering...

By DataFramed

Podcast•Mar 28, 2026•0 min

Your Data Vendor Is Charging You $800K to Solve a $100K Problem

In this episode Camille Bank reveals how mid‑size companies are paying upwards of $800 K annually for data stacks that solve far smaller problems, exposing hidden costs in Snowflake compute, connector services like Fivetran, BI tools, and the salaries of multiple...

By AI Adopters Club

Podcast•Mar 26, 2026•0 min

(Video) What Is Apache Spark?

The episode traces the evolution from Google’s MapReduce model to Apache Spark, explaining how Spark’s in‑memory processing and the Resilient Distributed Dataset (RDD) abstraction overcome MapReduce’s limitations for iterative and interactive workloads. It breaks down Spark’s core concepts—transformations vs. actions,...

By VuTrinh (Substack)

Podcast•Mar 25, 2026•10 min

The Hidden Complexity Behind Simple Dashboards

In this episode of the Dashboard Effect podcast, hosts Brick Thompson and Landon Oaks explore why the most valuable dashboards are often the simplest in appearance, yet the most complex to build behind the scenes. They share real‑world examples—including a...

By The Dashboard Effect

Podcast•Mar 24, 2026•0 min

Spark, AI, and the Future of Data Engineering with Daniel Aronovich

In this episode, host Dan Beach chats with data engineering veteran Daniel Aronovich about his 15‑year journey from MATLAB‑based signal processing at Intel to Python, Spark, and his current startup, True Data Flynn. Daniel explains how he transitioned from data...

By Data Engineering Central

Podcast•Mar 23, 2026•30 min

Inside OpenAI’s Streaming Backbone with Aravind Suresh | Ep. 24

In this episode, Aravind Suresh, head of OpenAI's real‑time infrastructure team, explains how the company built a highly reliable, scalable streaming backbone for products like ChatGPT using Kafka and Flink. He describes the challenges of scaling a streaming platform tenfold...

By Streaming Audio (Kafka / Confluent)

Podcast•Mar 23, 2026•56 min

#352 AI Agents at Work: What Actually Breaks (and How to Fix It) with Danielle Crop, EVP Digital Strategy &...

In this episode, Danielle Crop, EVP of Digital Strategy & Alliances at WNS, discusses the rapid rise of AI agents in enterprises, emphasizing the need to evaluate whether they deliver real value and operate securely. She advocates a balanced mindset...

By DataFramed

Podcast•Mar 18, 2026•0 min

DuckDB, AI, and the Future of Data Engineering

In this episode, Dan Beach chats with State Farm staff engineer Matt Martin about his journey from industrial engineering to data engineering, his deep involvement with DuckDB, and the evolving landscape of data platforms. Matt shares how early automation with...

By Data Engineering Central

Podcast•Mar 16, 2026•30 min

The 1 Billion Row Challenge with Gunnar Morling | Ep. 23

In this episode, Tim talks with Gunnar Morling, a principal technologist at Confluent and a key contributor to projects like Hibernate and Debezium, about his "One Billion Row Challenge"—a viral coding contest he launched for the Java community in January...

By Streaming Audio (Kafka / Confluent)

Podcast•Mar 11, 2026•41 min

Re-Air: Data Tools, Templates, and the Trouble with “Easy” Solutions with the Cynical Data Guy

In this re‑aired episode, hosts Eric Dotz and John Wessel chat with regular guest Matt, the Cynical Data Guy, about the rise of low‑code data tools like Clay and the evolving role of the “GT‑M engineer.” They debate whether such...

By The Data Stack Show

Podcast•Mar 8, 2026•54 min

The Iceberg Ecosystem Today (W/ Anders Swanson)

In this episode, Anders Swanson, a developer experience advocate at dbt Labs, walks through the current state of the Apache Iceberg ecosystem, covering how open‑source and cloud vendors are converging on shared standards, the rise of external catalog integrations, and...

By The Analytics Engineering Podcast

Podcast•Mar 6, 2026•42 min

AEC’s Single Source of Truth: Reality or Pipe Dream?

In this episode the hosts explore whether a true single source of truth (SSOT) for construction project data is achievable or merely aspirational. NuFORMA’s Dave Wagner and Carl Beillette argue that a single vendor solution is unrealistic; instead, the goal...

By AEC Business

Podcast•Feb 26, 2026•0 min

🎥 MSCI's Luke Flemmer - "Bringing Clarity to Investment Decisions"

In this episode, Luke Flemmer, head of private assets at MSCI, explains how standardizing and normalizing data can unlock transparency, price formation, and liquidity in private markets, drawing parallels to past evolutions in bonds, FX, and equities. He argues that...

By Alt Goes Mainstream

Parquet Fundamentals in 3 Mins

Corewell Health’s Jarve Says Population Health Data Challenges Demand Internal Builds

#353 The Data Team's Agentic Future with Ketan Karkhanis, CEO at ThoughtSpot

Your Data Vendor Is Charging You $800K to Solve a $100K Problem

(Video) What Is Apache Spark?

The Hidden Complexity Behind Simple Dashboards

Spark, AI, and the Future of Data Engineering with Daniel Aronovich

Inside OpenAI’s Streaming Backbone with Aravind Suresh | Ep. 24

#352 AI Agents at Work: What Actually Breaks (and How to Fix It) with Danielle Crop, EVP Digital Strategy &...

DuckDB, AI, and the Future of Data Engineering

The 1 Billion Row Challenge with Gunnar Morling | Ep. 23

Re-Air: Data Tools, Templates, and the Trouble with “Easy” Solutions with the Cynical Data Guy

The Iceberg Ecosystem Today (W/ Anders Swanson)

AEC’s Single Source of Truth: Reality or Pipe Dream?

🎥 MSCI's Luke Flemmer - "Bringing Clarity to Investment Decisions"

Big Data Pulse

Top Publishers

Top Creators

Top Companies

Top Investors

Parquet Fundamentals in 3 Mins

Corewell Health’s Jarve Says Population Health Data Challenges Demand Internal Builds

#353 The Data Team's Agentic Future with Ketan Karkhanis, CEO at ThoughtSpot

Your Data Vendor Is Charging You $800K to Solve a $100K Problem

(Video) What Is Apache Spark?

The Hidden Complexity Behind Simple Dashboards

Spark, AI, and the Future of Data Engineering with Daniel Aronovich

Inside OpenAI’s Streaming Backbone with Aravind Suresh | Ep. 24

#352 AI Agents at Work: What Actually Breaks (and How to Fix It) with Danielle Crop, EVP Digital Strategy &...

DuckDB, AI, and the Future of Data Engineering

The 1 Billion Row Challenge with Gunnar Morling | Ep. 23

Re-Air: Data Tools, Templates, and the Trouble with “Easy” Solutions with the Cynical Data Guy

The Iceberg Ecosystem Today (W/ Anders Swanson)

AEC’s Single Source of Truth: Reality or Pipe Dream?

🎥 MSCI's Luke Flemmer - "Bringing Clarity to Investment Decisions"