
A long-running weekly podcast that explores data science, machine learning, and AI through the lens of skeptical inquiry. Host Kyle Polich covers a wide range of topics – from interviews with AI researchers about new algorithms, to mini-series on subjects like natural language processing, time series, or GANs. Data Skeptic balances technical depth with clarity, making complex concepts approachable. Listeners gain both an understanding of cutting-edge techniques and a healthy skepticism toward hype, as Kyle and guests discuss real-world applications, limitations, and ethical dimensions of AI and data science.
In this episode, Data Skeptic talks with MIT PhD student Kat about how users can collectively manipulate recommender systems and why such coordinated actions can actually benefit the platforms. Kat explains her research that frames recommender algorithms as multi‑agent games, using game theory and matrix completion to model strategic user behavior beyond simple two‑player interactions. She highlights survey evidence showing that algorithmic activism—groups intentionally boosting or suppressing content—is more common than expected, and discusses the implications for algorithm design and fairness. The conversation also touches on how collaborative filtering creates incentives for users to act strategically, influencing both their own and others' recommendations.
In this episode, host Kyle Polich interviews Roan Schellingerhout, a PhD candidate researching explainable multi‑stakeholder job recommender systems. Roan explains how his AI‑driven platform uses knowledge graphs, inference rules, and large language models to generate simple textual explanations that users...
In this episode, David Liu, an assistant research professor at Cornell, discusses how standard recommender techniques like PCA and collaborative filtering can unintentionally disadvantage minority and niche user groups. He introduces the notion of "power niche users"—highly active users with...
In this episode, researcher Santiago de Leon explains how eye‑tracking technology captures gaze patterns—fixations and saccades—to reveal user behavior on recommender interfaces, and introduces the RecGaze dataset, the first eye‑tracking collection tailored for recommender‑system research. He shows how eye data...
In this episode, Data Skeptic talks with Virginia Tech assistant professor Boya Xu about modern recommender systems, focusing on a hybrid method that blends collaborative filtering, latent embeddings, and bandit learning. Xu explains how the approach tackles the cold‑start problem...

In this episode, Data Skeptic talks with Florian Atzenhofer-Baumgartner, a PhD student developing Monasterium.net, Europe’s largest digital archive of historical charters, about the unique challenges of building recommender systems for the digital humanities. Florian explains how sparse interaction data, cold‑start...

In this episode, host Kyle Polich and postdoc researcher Alberto Carlo Mario Mancino discuss DataRec, a Python library that automates dataset downloading, checksum verification, and standardized filtering for recommender‑system benchmarks such as MovieLens, Last.fm, and Amazon reviews. They highlight how...

The post discusses shilling attacks on recommender systems, where attackers create fake profiles to manipulate collaborative‑filtering algorithms for promotion or sabotage. It explains various attack types—random, segmented, bandwagon, and average—and shows that user‑user filtering is especially vulnerable, needing only a...

The post highlights Rebecca Salganik’s research on fairness in music recommendation systems, outlining group, individual, and counterfactual fairness and the problems of popularity and multi‑interest bias. She presents LARP, a multi‑stage multimodal framework that uses contrastive learning to align text...

The post interviews doctoral candidate Ashmi Banerjee about her research on AI-driven recommender systems that mitigate exposure bias and promote sustainable tourism. She describes using large language models to generate synthetic data, designing recommendation architectures that balance user satisfaction with...

The post features Abhishek Paudel, a PhD student who uses graph‑based methods to improve robotics, machine learning, and planning under uncertainty. He explains how graphs can model environments, capture spatial relationships, and serve as a unifying framework for multi‑level planning...

The post interviews Microsoft Gray Systems Lab principal scientist Yuanyuan Tian about how graph databases uniquely model relationships, enabling complex applications such as fraud detection, security, healthcare, and supply‑chain optimization. It highlights the practical challenges of moving from SQL to...

The post features David Obembe discussing his master’s thesis on creating a conversational interface for process‑mining tools using large language models (LLMs). He explains process mining fundamentals, how event logs become process maps, and how LLMs can speed up insight...