
Spark, AI, and the Future of Data Engineering with Daniel Aronovich
In this episode, host Dan Beach chats with data engineering veteran Daniel Aronovich about his 15‑year journey from MATLAB‑based signal processing at Intel to Python, Spark, and his current startup, True Data Flynn. Daniel explains how he transitioned from data science to data engineering, the challenges of scaling data pipelines on AWS EMR, and why he prefers PySpark over Scala. He also shares practical job‑search advice—leveraging LinkedIn to connect directly with technical hiring managers—and reflects on the rapid evolution of Spark, especially the impact of Databricks’ managed platform.
Inside OpenAI’s Streaming Backbone with Aravind Suresh | Ep. 24
In this episode, Aravind Suresh, head of OpenAI's real‑time infrastructure team, explains how the company built a highly reliable, scalable streaming backbone for products like ChatGPT using Kafka and Flink. He describes the challenges of scaling a streaming platform tenfold...

#352 AI Agents at Work: What Actually Breaks (and How to Fix It) with Danielle Crop, EVP Digital Strategy &...
In this episode, Danielle Crop, EVP of Digital Strategy & Alliances at WNS, discusses the rapid rise of AI agents in enterprises, emphasizing the need to evaluate whether they deliver real value and operate securely. She advocates a balanced mindset...

DuckDB, AI, and the Future of Data Engineering
In this episode, Dan Beach chats with State Farm staff engineer Matt Martin about his journey from industrial engineering to data engineering, his deep involvement with DuckDB, and the evolving landscape of data platforms. Matt shares how early automation with...
The 1 Billion Row Challenge with Gunnar Morling | Ep. 23
In this episode, Tim talks with Gunnar Morling, a principal technologist at Confluent and a key contributor to projects like Hibernate and Debezium, about his "One Billion Row Challenge"—a viral coding contest he launched for the Java community in January...

Re-Air: Data Tools, Templates, and the Trouble with “Easy” Solutions with the Cynical Data Guy
In this re‑aired episode, hosts Eric Dotz and John Wessel chat with regular guest Matt, the Cynical Data Guy, about the rise of low‑code data tools like Clay and the evolving role of the “GT‑M engineer.” They debate whether such...

The Iceberg Ecosystem Today (W/ Anders Swanson)
In this episode, Anders Swanson, a developer experience advocate at dbt Labs, walks through the current state of the Apache Iceberg ecosystem, covering how open‑source and cloud vendors are converging on shared standards, the rise of external catalog integrations, and...

AEC’s Single Source of Truth: Reality or Pipe Dream?
In this episode the hosts explore whether a true single source of truth (SSOT) for construction project data is achievable or merely aspirational. NuFORMA’s Dave Wagner and Carl Beillette argue that a single vendor solution is unrealistic; instead, the goal...

🎥 MSCI's Luke Flemmer - "Bringing Clarity to Investment Decisions"
In this episode, Luke Flemmer, head of private assets at MSCI, explains how standardizing and normalizing data can unlock transparency, price formation, and liquidity in private markets, drawing parallels to past evolutions in bonds, FX, and equities. He argues that...
Killing Clusters & Orchestrating Chaos with Colt McNealy | Ep. 20
In this episode Tim Berglund talks with Colt McNealy, founder and CEO of Little Horse, about building a Kafka‑based platform for orchestrating microservice workflows and AI agents. Colt describes how his early experience debugging monolithic code with GDB contrasted with...

#347 Let's Get Physical with AI with Ivan Poupyrev, CEO at Archetype AI
In this episode, Ivan Poupyrev, CEO of Archetype AI, explains that "physical AI" goes far beyond robotics, embedding foundation‑model intelligence into everyday devices—from washing machines to HVAC systems—and enabling them to communicate and optimize as a unified system. He outlines...
Petra Durnin: You Don't Need More Tech — You Need Better Data
In this episode, Petra Durnin, a veteran CRE researcher and tech‑to‑impact strategist, explains why the industry’s biggest hurdle isn’t more tools but cleaner, more integrated data. She walks through her career trajectory, from a temp analyst to leading data and...

Data Is the New Oil, and Your Database Is the only Way to Extract It
In this episode, Ryan interviews Shireesh Thota, Corporate Vice President of Azure Databases at Microsoft, about the rapid evolution of Microsoft's database offerings, including SQL Server, Cosmos DB, and Postgres, and how they fit into a unified Azure data platform....

Driving Safer AVs Faster with Smart Simulation, Neural Reconstruction, and Data-Centric Tools - Ep. 289
In this episode, Rohan Bhasin of Fortellix and Dan Gural of Voxel51 discuss how autonomous‑vehicle (AV) teams can transform massive drive‑log datasets into high‑fidelity simulations using neural reconstruction, scenario‑driven data curation, and NVIDIA‑accelerated pipelines. They explain how these tools enable...

Re-Air: Data Teams at the Crossroads: Proving Value in a Changing Business Landscape with Ben Rogojan
In this re‑aired episode, John interviews Ben Rogojan, owner of Seattle Data Guy, about how data teams can demonstrate value amid tighter budgets and rapid AI advances. They discuss shifting from output‑focused metrics like dashboards to outcome‑driven results, the importance...
Fail Fast & Ship It with Jeremy Custenborder | Ep. 18
In this episode, Viktor Gamov interviews Jeremy Custenborder of Confluent about his journey from a paper boy to a leader in large‑scale systems, focusing on his experience keeping MySpace operational at massive pre‑cloud scale. Jeremy explains how he built custom...

#345 How to Drive Innovation with Brian Solis, Head of Global Innovation at ServiceNow
In episode #345, DataFramed hosts Adel Nehme and Richie Cotton sit down with Brian Solis, Head of Global Innovation at ServiceNow, to explore how organizations can foster a culture of continuous innovation. Solis emphasizes the importance of aligning innovation with...
Airbnb’s Open-Source GraphQL Framework with Adam Miskiewicz
In this episode, Adam Miskiewicz, Principal Software Engineer at Airbnb, explains how the company built Viaduct, an open‑source data‑oriented service mesh and GraphQL platform that unifies a central schema across millions of microservices. He details the architectural principles—centralized schema, consistent...
#290: Always Be Learning
In this episode, Tim Wilson, Val Kroll, and Spotify product manager/data scientist Mårten Schultzberg discuss the limits of focusing solely on win rates in experimentation and introduce a broader "learning rate" metric that captures wins, regressions (avoiding bad outcomes), and neutral...

From “This May Never Work” To WarpStream with Richie Artoul | Ep. 17
In this episode, Tim Berglund chats with data infrastructure veteran Richie Artoul about his unconventional path—from running a LAN gaming café to building log storage at Datadog and now leading WarpStream at Confluent. Richie shares the technical and cultural challenges...

It's Friday, Juan and Tim Rant with Data Day Texas Takeaways
In this 34‑minute episode, Juan and Tim unwind over a beer to discuss recent developments in the data landscape and share their key takeaways from Data Day Texas. They cover topics such as the hype around AI versus real monetary...

Inside $3M GPU Racks: Powering Modern AI with Bryan Oliver | Ep. 16
In this episode, Adi Polak interviews Bryan Oliver of Thoughtworks about his journey from building swimming pools to engineering massive GPU racks for AI workloads. Oliver explains the technical and operational challenges of running $3M GPU data centers, focusing on...

From Evidence to Adoption: How datosX Is Redefining Digital Health Validation
In this episode, Unity Stoakes interviews Robin Roberts, CEO of datosX Digital Health Labs, about transforming digital health validation from a bottleneck into a catalyst for adoption. Roberts explains how datosX leverages tier‑1 health system partnerships to run regulatory‑grade validation...
#289: The Imperative of Developing Business Acumen
In episode #289 the hosts discuss the essential role of business acumen for data and analytics professionals, defining it as both a grasp of general business fundamentals (finance, marketing, P&L) and deep knowledge of one’s own organization and industry context....
Agent Psychosis: Are We Going Insane?
In this episode, Armin Ronacher warns that AI agent psychosis could be making us collectively uneasy, while Dan Abramov breaks down the AT Protocol as a social filesystem for decentralized apps. RepoBar is highlighted as a tool that surfaces your...

Hacking Kafka Streams with Sophie Blee‑Goldman | Ep. 15
In this episode, Tim Berglund interviews Sophie Blee‑Goldman of Responsive about her journey from a Google internship to becoming a specialist in container orchestration and Kafka Streams. They dive into the technical challenge of scaling a Kafka Streams application for...

Teaching AI How to Forget
In this episode Ben Lorica interviews Ben Luria, CEO and co‑founder of Hirundo, about the rising importance of machine unlearning for enterprise AI systems. They explore how organizations can remove or forget specific data points from trained models to comply...
America Under Surveillance with Michael Soyfer
In this episode, Kevin Ball talks with Institute for Justice attorney Michael Soyfer about the rapid expansion of surveillance technologies such as automated license‑plate readers, facial‑recognition cameras, and predictive policing tools across U.S. municipalities. Soyfer explains the Fourth Amendment challenges...

#266 The CFO’s Secret Weapon Behind Higher Business Valuations: The Data Cube with David Whitcombe, Founder and Managing Director, Data...
In this episode, Kevin Appleby and data‑analytics expert David Whitcombe explain how a "data cube"—a unified, governed layer that pulls together ERP, CRM, and operational data—gives CFOs a single source of truth that drives higher valuations in private‑equity exits. By...

#534: Diskcache: Your Secret Python Perf Weapon
In this episode Michael Kennedy talks with Vincent Warmerdam about DiskCache, a SQLite‑backed, dictionary‑like cache that persists to disk and works safely across threads and processes. They explain how DiskCache’s @cache.memoize decorator and FanoutCache sharding enable cheap, high‑performance caching for...

Turning Chaos Into Push-Button Provisioning with Dhiraj Suri| Ep. 14
In this episode, Viktor Gamov interviews Dhiraj Suri of Confluent about his journey from a software developer at NetApp to a systems engineering leader focused on stream governance. Dhiraj explains how he tackled the challenge of integrating fragmented tools at...