Big Data Podcasts

Podcast•Mar 24, 2026•0 min

Spark, AI, and the Future of Data Engineering with Daniel Aronovich

In this episode, host Dan Beach chats with data engineering veteran Daniel Aronovich about his 15‑year journey from MATLAB‑based signal processing at Intel to Python, Spark, and his current startup, True Data Flynn. Daniel explains how he transitioned from data science to data engineering, the challenges of scaling data pipelines on AWS EMR, and why he prefers PySpark over Scala. He also shares practical job‑search advice—leveraging LinkedIn to connect directly with technical hiring managers—and reflects on the rapid evolution of Spark, especially the impact of Databricks’ managed platform.

By Data Engineering Central

Podcast•Mar 23, 2026•30 min

Inside OpenAI’s Streaming Backbone with Aravind Suresh | Ep. 24

In this episode, Aravind Suresh, head of OpenAI's real‑time infrastructure team, explains how the company built a highly reliable, scalable streaming backbone for products like ChatGPT using Kafka and Flink. He describes the challenges of scaling a streaming platform tenfold...

By Streaming Audio (Kafka / Confluent)

Podcast•Mar 23, 2026•56 min

#352 AI Agents at Work: What Actually Breaks (and How to Fix It) with Danielle Crop, EVP Digital Strategy &...

In this episode, Danielle Crop, EVP of Digital Strategy & Alliances at WNS, discusses the rapid rise of AI agents in enterprises, emphasizing the need to evaluate whether they deliver real value and operate securely. She advocates a balanced mindset...

By DataFramed

Podcast•Mar 18, 2026•0 min

DuckDB, AI, and the Future of Data Engineering

In this episode, Dan Beach chats with State Farm staff engineer Matt Martin about his journey from industrial engineering to data engineering, his deep involvement with DuckDB, and the evolving landscape of data platforms. Matt shares how early automation with...

By Data Engineering Central

Podcast•Mar 16, 2026•30 min

The 1 Billion Row Challenge with Gunnar Morling | Ep. 23

In this episode, Tim talks with Gunnar Morling, a principal technologist at Confluent and a key contributor to projects like Hibernate and Debezium, about his "One Billion Row Challenge"—a viral coding contest he launched for the Java community in January...

By Streaming Audio (Kafka / Confluent)

Podcast•Mar 11, 2026•41 min

Re-Air: Data Tools, Templates, and the Trouble with “Easy” Solutions with the Cynical Data Guy

In this re‑aired episode, hosts Eric Dotz and John Wessel chat with regular guest Matt, the Cynical Data Guy, about the rise of low‑code data tools like Clay and the evolving role of the “GT‑M engineer.” They debate whether such...

By The Data Stack Show

Podcast•Mar 8, 2026•54 min

The Iceberg Ecosystem Today (W/ Anders Swanson)

In this episode, Anders Swanson, a developer experience advocate at dbt Labs, walks through the current state of the Apache Iceberg ecosystem, covering how open‑source and cloud vendors are converging on shared standards, the rise of external catalog integrations, and...

By The Analytics Engineering Podcast

Podcast•Mar 6, 2026•42 min

AEC’s Single Source of Truth: Reality or Pipe Dream?

In this episode the hosts explore whether a true single source of truth (SSOT) for construction project data is achievable or merely aspirational. NuFORMA’s Dave Wagner and Carl Beillette argue that a single vendor solution is unrealistic; instead, the goal...

By AEC Business

Podcast•Feb 26, 2026•0 min

🎥 MSCI's Luke Flemmer - "Bringing Clarity to Investment Decisions"

In this episode, Luke Flemmer, head of private assets at MSCI, explains how standardizing and normalizing data can unlock transparency, price formation, and liquidity in private markets, drawing parallels to past evolutions in bonds, FX, and equities. He argues that...

By Alt Goes Mainstream

Podcast•Feb 23, 2026•38 min

Killing Clusters & Orchestrating Chaos with Colt McNealy | Ep. 20

In this episode Tim Berglund talks with Colt McNealy, founder and CEO of Little Horse, about building a Kafka‑based platform for orchestrating microservice workflows and AI agents. Colt describes how his early experience debugging monolithic code with GDB contrasted with...

By Streaming Audio (Kafka / Confluent)

Podcast•Feb 23, 2026•45 min

#347 Let's Get Physical with AI with Ivan Poupyrev, CEO at Archetype AI

In this episode, Ivan Poupyrev, CEO of Archetype AI, explains that "physical AI" goes far beyond robotics, embedding foundation‑model intelligence into everyday devices—from washing machines to HVAC systems—and enabling them to communicate and optimize as a unified system. He outlines...

By DataFramed

Podcast•Feb 18, 2026•48 min

Petra Durnin: You Don't Need More Tech — You Need Better Data

In this episode, Petra Durnin, a veteran CRE researcher and tech‑to‑impact strategist, explains why the industry’s biggest hurdle isn’t more tools but cleaner, more integrated data. She walks through her career trajectory, from a temp analyst to leading data and...

By The Crexi Commercial Real Estate Podcast | CRE Insights & Strategies

Podcast•Feb 17, 2026•40 min

Data Is the New Oil, and Your Database Is the only Way to Extract It

In this episode, Ryan interviews Shireesh Thota, Corporate Vice President of Azure Databases at Microsoft, about the rapid evolution of Microsoft's database offerings, including SQL Server, Cosmos DB, and Postgres, and how they fit into a unified Azure data platform....

By Stack Overflow Podcast

Podcast•Feb 11, 2026•45 min

Driving Safer AVs Faster with Smart Simulation, Neural Reconstruction, and Data-Centric Tools - Ep. 289

In this episode, Rohan Bhasin of Fortellix and Dan Gural of Voxel51 discuss how autonomous‑vehicle (AV) teams can transform massive drive‑log datasets into high‑fidelity simulations using neural reconstruction, scenario‑driven data curation, and NVIDIA‑accelerated pipelines. They explain how these tools enable...

By The AI Podcast (NVIDIA)

Podcast•Feb 11, 2026•52 min

Re-Air: Data Teams at the Crossroads: Proving Value in a Changing Business Landscape with Ben Rogojan

In this re‑aired episode, John interviews Ben Rogojan, owner of Seattle Data Guy, about how data teams can demonstrate value amid tighter budgets and rapid AI advances. They discuss shifting from output‑focused metrics like dashboards to outcome‑driven results, the importance...

By The Data Stack Show

Podcast•Feb 9, 2026•27 min

Fail Fast & Ship It with Jeremy Custenborder | Ep. 18

In this episode, Viktor Gamov interviews Jeremy Custenborder of Confluent about his journey from a paper boy to a leader in large‑scale systems, focusing on his experience keeping MySpace operational at massive pre‑cloud scale. Jeremy explains how he built custom...

By Streaming Audio (Kafka / Confluent)

Podcast•Feb 9, 2026•1h 7m

#345 How to Drive Innovation with Brian Solis, Head of Global Innovation at ServiceNow

In episode #345, DataFramed hosts Adel Nehme and Richie Cotton sit down with Brian Solis, Head of Global Innovation at ServiceNow, to explore how organizations can foster a culture of continuous innovation. Solis emphasizes the importance of aligning innovation with...

By DataFramed

Podcast•Feb 5, 2026•55 min

Airbnb’s Open-Source GraphQL Framework with Adam Miskiewicz

In this episode, Adam Miskiewicz, Principal Software Engineer at Airbnb, explains how the company built Viaduct, an open‑source data‑oriented service mesh and GraphQL platform that unifies a central schema across millions of microservices. He details the architectural principles—centralized schema, consistent...

By Software Engineering Daily – Data

Podcast•Feb 3, 2026•1h 6m

#290: Always Be Learning

In this episode, Tim Wilson, Val Kroll, and Spotify product manager/data scientist Mårten Schultzberg discuss the limits of focusing solely on win rates in experimentation and introduce a broader "learning rate" metric that captures wins, regressions (avoiding bad outcomes), and neutral...

By Digital Analytics Power Hour

Podcast•Feb 2, 2026•30 min

From “This May Never Work” To WarpStream with Richie Artoul | Ep. 17

In this episode, Tim Berglund chats with data infrastructure veteran Richie Artoul about his unconventional path—from running a LAN gaming café to building log storage at Datadog and now leading WarpStream at Confluent. Richie shares the technical and cultural challenges...

By Streaming Audio (Kafka / Confluent)

Podcast•Jan 30, 2026•34 min

It's Friday, Juan and Tim Rant with Data Day Texas Takeaways

In this 34‑minute episode, Juan and Tim unwind over a beer to discuss recent developments in the data landscape and share their key takeaways from Data Day Texas. They cover topics such as the hype around AI versus real monetary...

By Catalog & Cocktails

Podcast•Jan 26, 2026•30 min

Inside $3M GPU Racks: Powering Modern AI with Bryan Oliver | Ep. 16

In this episode, Adi Polak interviews Bryan Oliver of Thoughtworks about his journey from building swimming pools to engineering massive GPU racks for AI workloads. Oliver explains the technical and operational challenges of running $3M GPU data centers, focusing on...

By Streaming Audio (Kafka / Confluent)

Podcast•Jan 20, 2026•36 min

From Evidence to Adoption: How datosX Is Redefining Digital Health Validation

In this episode, Unity Stoakes interviews Robin Roberts, CEO of datosX Digital Health Labs, about transforming digital health validation from a bottleneck into a catalyst for adoption. Roberts explains how datosX leverages tier‑1 health system partnerships to run regulatory‑grade validation...

By StartUp Health NOW

Podcast•Jan 20, 2026•1h 10m

#289: The Imperative of Developing Business Acumen

In episode #289 the hosts discuss the essential role of business acumen for data and analytics professionals, defining it as both a grasp of general business fundamentals (finance, marketing, P&L) and deep knowledge of one’s own organization and industry context....

By Digital Analytics Power Hour

Podcast•Jan 19, 2026•6 min

Agent Psychosis: Are We Going Insane?

In this episode, Armin Ronacher warns that AI agent psychosis could be making us collectively uneasy, while Dan Abramov breaks down the AT Protocol as a social filesystem for decentralized apps. RepoBar is highlighted as a tool that surfaces your...

By Practical AI

Podcast•Jan 19, 2026•34 min

Hacking Kafka Streams with Sophie Blee‑Goldman | Ep. 15

In this episode, Tim Berglund interviews Sophie Blee‑Goldman of Responsive about her journey from a Google internship to becoming a specialist in container orchestration and Kafka Streams. They dive into the technical challenge of scaling a Kafka Streams application for...

By Streaming Audio (Kafka / Confluent)

Podcast•Jan 15, 2026•43 min

Teaching AI How to Forget

In this episode Ben Lorica interviews Ben Luria, CEO and co‑founder of Hirundo, about the rising importance of machine unlearning for enterprise AI systems. They explore how organizations can remove or forget specific data points from trained models to comply...

By The Data Exchange

Podcast•Jan 15, 2026•52 min

America Under Surveillance with Michael Soyfer

In this episode, Kevin Ball talks with Institute for Justice attorney Michael Soyfer about the rapid expansion of surveillance technologies such as automated license‑plate readers, facial‑recognition cameras, and predictive policing tools across U.S. municipalities. Soyfer explains the Fourth Amendment challenges...

By Software Engineering Daily – Data

Podcast•Jan 13, 2026•25 min

#266 The CFO’s Secret Weapon Behind Higher Business Valuations: The Data Cube with David Whitcombe, Founder and Managing Director, Data...

In this episode, Kevin Appleby and data‑analytics expert David Whitcombe explain how a "data cube"—a unified, governed layer that pulls together ERP, CRM, and operational data—gives CFOs a single source of truth that drives higher valuations in private‑equity exits. By...

By GrowCFO Show

Podcast•Jan 13, 2026•1h 14m

#534: Diskcache: Your Secret Python Perf Weapon

In this episode Michael Kennedy talks with Vincent Warmerdam about DiskCache, a SQLite‑backed, dictionary‑like cache that persists to disk and works safely across threads and processes. They explain how DiskCache’s @cache.memoize decorator and FanoutCache sharding enable cheap, high‑performance caching for...

By Talk Python to Me

Podcast•Jan 12, 2026•21 min

Turning Chaos Into Push-Button Provisioning with Dhiraj Suri| Ep. 14

In this episode, Viktor Gamov interviews Dhiraj Suri of Confluent about his journey from a software developer at NetApp to a systems engineering leader focused on stream governance. Dhiraj explains how he tackled the challenge of integrating fragmented tools at...

By Streaming Audio (Kafka / Confluent)

Spark, AI, and the Future of Data Engineering with Daniel Aronovich

Inside OpenAI’s Streaming Backbone with Aravind Suresh | Ep. 24

#352 AI Agents at Work: What Actually Breaks (and How to Fix It) with Danielle Crop, EVP Digital Strategy &...

DuckDB, AI, and the Future of Data Engineering

The 1 Billion Row Challenge with Gunnar Morling | Ep. 23

Re-Air: Data Tools, Templates, and the Trouble with “Easy” Solutions with the Cynical Data Guy

The Iceberg Ecosystem Today (W/ Anders Swanson)

AEC’s Single Source of Truth: Reality or Pipe Dream?

🎥 MSCI's Luke Flemmer - "Bringing Clarity to Investment Decisions"

Killing Clusters & Orchestrating Chaos with Colt McNealy | Ep. 20

#347 Let's Get Physical with AI with Ivan Poupyrev, CEO at Archetype AI

Petra Durnin: You Don't Need More Tech — You Need Better Data

Data Is the New Oil, and Your Database Is the only Way to Extract It

Driving Safer AVs Faster with Smart Simulation, Neural Reconstruction, and Data-Centric Tools - Ep. 289

Re-Air: Data Teams at the Crossroads: Proving Value in a Changing Business Landscape with Ben Rogojan

Fail Fast & Ship It with Jeremy Custenborder | Ep. 18

#345 How to Drive Innovation with Brian Solis, Head of Global Innovation at ServiceNow

Airbnb’s Open-Source GraphQL Framework with Adam Miskiewicz

#290: Always Be Learning

From “This May Never Work” To WarpStream with Richie Artoul | Ep. 17

It's Friday, Juan and Tim Rant with Data Day Texas Takeaways

Inside $3M GPU Racks: Powering Modern AI with Bryan Oliver | Ep. 16

From Evidence to Adoption: How datosX Is Redefining Digital Health Validation

#289: The Imperative of Developing Business Acumen

Agent Psychosis: Are We Going Insane?

Hacking Kafka Streams with Sophie Blee‑Goldman | Ep. 15

Teaching AI How to Forget

America Under Surveillance with Michael Soyfer

#266 The CFO’s Secret Weapon Behind Higher Business Valuations: The Data Cube with David Whitcombe, Founder and Managing Director, Data...

#534: Diskcache: Your Secret Python Perf Weapon

Turning Chaos Into Push-Button Provisioning with Dhiraj Suri| Ep. 14

Big Data Pulse