This Day in AI Podcast

GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie? EP99.28-5.2

This Day in AI Podcast

•December 12, 2025•1h 3m

This Day in AI Podcast•Dec 12, 2025

Key Takeaways

•GPT‑5.2 adds 400k context but remains largely unchanged.
•Model outputs verbose, overly‑cautious replies, hindering tool‑calling chains.
•Anthropic’s Claude iterates, offering more reliable agentic behavior.
•OpenAI’s new guardrails introduce unnecessary warnings, reducing utility.
•Grok and Gemini deliver cheaper, faster models, challenging OpenAI.

Pulse Analysis

OpenAI’s GPT‑5.2 hit the market with a 400 k token context window and a modest price bump to $1.75 per million input tokens. While the larger window and vision tool upgrades sound impressive, most users report that the core model feels unchanged, delivering overly verbose answers that add little value. The release feels like a tuning exercise rather than a breakthrough, and the higher cost raises questions about ROI for businesses seeking efficient, concise outputs.

The biggest pain point emerges in tool‑calling and agentic workflows. GPT‑5.2 often fails to chain multiple tool calls, stumbling on parallel requests and refusing to self‑correct when a call goes awry. In contrast, Anthropic’s Claude models operate with an internal “clock,” iterating over tasks and allowing users to intervene between rounds. This iterative design yields more reliable agentic behavior, especially for complex, multi‑step processes. OpenAI’s newer ethical guardrails also inject unnecessary disclaimer language, further slowing down interactions and limiting practical utility.

Competitive pressure is reshaping the landscape. Grok 4.1 offers a fraction of the cost—around $0.20 per million tokens—and delivers faster responses, making it attractive for time‑critical tool‑calling scenarios. Gemini’s integration into Google services is gaining user trust, while DeepSeek’s emergence broke the perception that ChatGPT is the sole AI option. These dynamics have inflicted brand damage on OpenAI and delayed the anticipated “year of agents.” As enterprises experiment with delegation versus iterative collaboration, the market will likely reward models that balance speed, cost, and reliable agentic capabilities.

Episode Description

Join Simtheory: https://simtheory.ai

GPT-5.2 is here and... it's not great. In this episode, we put OpenAI's latest model through its paces and discover it can't even identify a convicted serial killer when the text literally says "serial killer." We compare it head-to-head with Claude Opus and Gemini 3 Pro (spoiler: they win). Plus, we reflect on the "Year of Agents" that wasn't, why your barber switched to Grok, Disney's billion-dollar investment to use Mickey Mouse in Sora, and why Mustafa Suleyman should probably be fired. Also featuring: the GPT-5.2 diss track where the model brags about capabilities it doesn't have.

CHAPTERS:

00:00 Intro - GPT-5.2 Drops + Details

01:25 First Impressions: Verbose, Overhyped, Vibe-Tuned

02:52 OpenAI's Rushed Response to Gemini 3

03:24 Tool Calling Problems & Agentic Failures

04:14 Why Anthropic's Models Just Work Better

06:31 The Barber Test: Real Users Are Switching to Grok

10:00 The Ivan Milat Vision Test (Serial Killer Edition)

17:04 Year of Agents Retrospective: What Went Wrong

25:28 The Path to True Agentic Workflows

31:22 GPT-5.2 Diss Track (Yes, Really)

43:43 Why We're Still Optimistic About AI

50:29 Google Bringing Ads to Gemini in 2026

54:46 Disney Pays $1B to Use Mickey Mouse in Sora

56:57 LOL of the Week: Mustafa Suleyman's Sad Tweets

1:00:35 Outro & Full GPT-5.2 Diss Track

Thanks for listening. Like & Sub. xoxox

Show Notes

Comments

Want to join the conversation?

Loading comments...

Join Simtheory: https://simtheory.ai

CHAPTERS:

00:00 Intro - GPT-5.2 Drops + Details

01:25 First Impressions: Verbose, Overhyped, Vibe-Tuned

02:52 OpenAI's Rushed Response to Gemini 3

03:24 Tool Calling Problems & Agentic Failures

04:14 Why Anthropic's Models Just Work Better

06:31 The Barber Test: Real Users Are Switching to Grok

10:00 The Ivan Milat Vision Test (Serial Killer Edition)

17:04 Year of Agents Retrospective: What Went Wrong

25:28 The Path to True Agentic Workflows

31:22 GPT-5.2 Diss Track (Yes, Really)

43:43 Why We're Still Optimistic About AI

50:29 Google Bringing Ads to Gemini in 2026

54:46 Disney Pays $1B to Use Mickey Mouse in Sora

56:57 LOL of the Week: Mustafa Suleyman's Sad Tweets

1:00:35 Outro & Full GPT-5.2 Diss Track

Thanks for listening. Like & Sub. xoxox

AI Pulse

GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie? EP99.28-5.2

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Episode Description

Show Notes

Comments

AI Pulse

GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie? EP99.28-5.2

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Episode Description

Show Notes

Comments