Big Data Podcasts

Re-Air: The Rise of the Citizen Developer: Solving Business Problems with Alteryx and AI with Andy Macmillan
PodcastMay 20, 202650 min

Re-Air: The Rise of the Citizen Developer: Solving Business Problems with Alteryx and AI with Andy Macmillan

In this re‑aired episode, Alteryx CEO Andy Macmillan discusses the evolution of the citizen developer—business users with enough technical skill to build data solutions—and how AI is reshaping that role. He explains Alteryx’s mission to democratize data preparation and analytics,...

By The Data Stack Show
Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298
PodcastMay 13, 202623 min

Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298

Snap’s engineering platform head, Prudvi Vatala, explains how the company slashed data‑processing costs by 76% and reduced core usage by 62% by migrating its 10‑petabyte‑per‑day experimentation pipeline to GPU‑accelerated Spark using NVIDIA Spark RAPIDS on Google Cloud. The move delivered...

By The AI Podcast (NVIDIA)
Your LLM Issues Are Really Data Issues
PodcastApr 28, 202631 min

Your LLM Issues Are Really Data Issues

In this episode, Ryan Donovan talks with Harsha Chintalapani, co‑founder and CTO of Collate, about why the biggest challenges facing LLMs in production are actually data problems. Harsha explains how issues like schema drift, ambiguous business definitions, data discovery, lineage,...

By Stack Overflow Podcast
Perceptron Network – A Thousand Eyes, One Vision for Decentralized AI Data
PodcastApr 22, 202628 min

Perceptron Network – A Thousand Eyes, One Vision for Decentralized AI Data

In this episode, Andy Pickering talks with Peter Anthony, co‑founder of Perceptron, about the company’s decentralized data infrastructure that taps idle user bandwidth to collect real‑time, geographically diverse web data for AI training. Peter explains how the "thousand eyes, one...

By The Crypto Conversation
Building Banking Systems with Kafka Streams with Mateo Rojas | Ep. 28
PodcastApr 20, 202644 min

Building Banking Systems with Kafka Streams with Mateo Rojas | Ep. 28

In this episode, Mateo Rojas recounts his early‑day experiences building a policy‑management platform for a banking‑type application using Kafka Streams when the technology was still nascent. He describes the challenges of orchestrating multiple microservices via stream joins, handling windowing limits,...

By Streaming Audio (Kafka / Confluent)
Scaling Regulated Data Workflows Without Lock‑In - with Juan Orlandini of Insight
PodcastApr 17, 202622 min

Scaling Regulated Data Workflows Without Lock‑In - with Juan Orlandini of Insight

In this episode, Juan Orlandini, CTO of North America at Insight, explains how finance leaders can modernize chaotic, regulated data environments by integrating AI thoughtfully rather than layering it on outdated systems. He stresses that generative AI excels at pattern...

By The AI in Business Podcast
Postgres Can Be Your Data Lake (Pg_lake)
PodcastApr 9, 20260 min

Postgres Can Be Your Data Lake (Pg_lake)

In this episode Marco introduces PgLake, an extension that lets PostgreSQL query and manage data lakes stored as Iceberg tables in object storage. By delegating analytical queries to DuckDB’s vectorized engine, PgLake can achieve up to 100× faster performance than...

By Stanislav’s Big Data Stream (Substack)
#354 Beyond BI: Decision Intelligence with Graphs with Jamie Hutton, CTO at Quantexa
PodcastApr 6, 202646 min

#354 Beyond BI: Decision Intelligence with Graphs with Jamie Hutton, CTO at Quantexa

In this episode, CTO Jamie Hutton of Quantexa explains how decision intelligence extends beyond traditional business intelligence by using graph‑based context and entity resolution to create a single, trustworthy view of people, companies, and relationships. He details how Quantexa’s platform...

By DataFramed
Parquet Fundamentals in 3 Mins
PodcastApr 3, 20260 min

Parquet Fundamentals in 3 Mins

The episode explains how Apache Parquet’s hybrid columnar‑row format optimizes storage and query performance for large datasets. It contrasts row‑wise and pure columnar layouts, highlighting the inefficiencies of each, and then describes Parquet’s structure of row groups, column chunks, and...

By VuTrinh (Substack)
Corewell Health’s Jarve Says Population Health Data Challenges Demand Internal Builds
PodcastApr 1, 202637 min

Corewell Health’s Jarve Says Population Health Data Challenges Demand Internal Builds

In this episode, Dr. Bob Jarvie, Associate CMIO and Medical Director for Population Health Analytics at Corewell Health, explains why the health system built its own internal population health data platform instead of relying on external vendors. He highlights the...

By healthsystemCIO
#353 The Data Team's Agentic Future with Ketan Karkhanis, CEO at ThoughtSpot
PodcastMar 30, 202649 min

#353 The Data Team's Agentic Future with Ketan Karkhanis, CEO at ThoughtSpot

In this episode, ThoughtSpot CEO Ketan Karkhanis discusses how AI agents are reshaping data analytics, turning self‑service BI from a long‑standing promise into a reality. He showcases ThoughtSpot’s agents—Spotter, Spotter Model, and SpotterWiz—that can answer business questions, automate data engineering...

By DataFramed
Your Data Vendor Is Charging You $800K to Solve a $100K Problem
PodcastMar 28, 20260 min

Your Data Vendor Is Charging You $800K to Solve a $100K Problem

In this episode Camille Bank reveals how mid‑size companies are paying upwards of $800 K annually for data stacks that solve far smaller problems, exposing hidden costs in Snowflake compute, connector services like Fivetran, BI tools, and the salaries of multiple...

By AI Adopters Club
(Video) What Is Apache Spark?
PodcastMar 26, 20260 min

(Video) What Is Apache Spark?

The episode traces the evolution from Google’s MapReduce model to Apache Spark, explaining how Spark’s in‑memory processing and the Resilient Distributed Dataset (RDD) abstraction overcome MapReduce’s limitations for iterative and interactive workloads. It breaks down Spark’s core concepts—transformations vs. actions,...

By VuTrinh (Substack)
The Hidden Complexity Behind Simple Dashboards
PodcastMar 25, 202610 min

The Hidden Complexity Behind Simple Dashboards

In this episode of the Dashboard Effect podcast, hosts Brick Thompson and Landon Oaks explore why the most valuable dashboards are often the simplest in appearance, yet the most complex to build behind the scenes. They share real‑world examples—including a...

By The Dashboard Effect