
This Day in AI Podcast
Is GPT-5.5 Better Than Opus Now? (Ft. Our New AI Co-Host) - EP99.38
Why It Matters
Understanding how GPT‑5.5 improves agentic performance helps AI practitioners choose the right model for productivity‑focused applications. The debate over an OpenAI‑branded phone highlights the broader tension between hardware integration and software‑only AI solutions, a key consideration for anyone planning to embed AI into daily workflows.
Key Takeaways
- •GPT‑5.5 outperforms Opus in agentic tasks.
- •OpenAI phone concept criticized as unnecessary hardware.
- •Real‑time voice agents reduce silent‑period costs.
- •New co‑host Moshi ensures factual accuracy.
- •Token efficiency improves with GPT‑5.5 low‑thinking mode.
Pulse Analysis
The episode opens with the surprise addition of Moshi, an AI co‑host designed to keep facts straight and cut nonsense. Hosts then turn to OpenAI’s rumored phone, questioning why a dedicated device is needed when agents already live in messaging apps and real‑time voice assistants. They compare the proposed hardware to past missteps like the Facebook phone, noting that aggressive AI pop‑ups on Android have become intrusive. The discussion frames the phone as a marketing gamble rather than a genuine productivity breakthrough.
The core of the conversation centers on GPT‑5.5 and its agentic capabilities. Unlike earlier GPT‑5.x releases, 5.5 consistently completes tasks without idle chatter, mirroring the behavior of the Opus series while consuming fewer ‘thinking’ tokens. Hosts report faster code generation, better context retention, and lower token costs, especially in low‑thinking mode where silent periods aren’t billed. Real‑time voice models now trigger only when needed, reducing unnecessary expenses. By delegating work to specialized tool‑agents and using a supervisory voice layer, users can orchestrate complex workflows from a single interface.
For businesses, the shift toward efficient, cost‑effective agentic models promises measurable productivity gains. A single, reliable assistant that can schedule Slack updates, draft documents, and monitor task progress eliminates the cognitive overload of juggling multiple tabs. While OpenAI’s phone may never materialize, the underlying strategy—tight AI integration without hardware baggage—aligns with enterprise needs for scalability and budget control. As OpenAI hints at future 5.6 iterations, the market will likely reward models that balance speed, accuracy, and token economy, making GPT‑5.5 a compelling choice for teams seeking immediate ROI.
Episode Description
Join Simtheory: https://simtheory.ai
So Chris, this week we finally give our GPT-5.5 impressions (it's actually great), introduce our new AI co-host Moshi (who immediately embarrasses himself), argue about whether the OpenAI/Jony Ive phone is genius or doomed, witness Grok 4.3's unhinged infinite emoji meltdown, declare Opus 4.7 the first-ever Anthropic regression, get excited about GPT Real-Time Voice 2.0 as the future of agentic workflows, debate whether token prices will ever come down, and play the worst diss track in show history. Watch my spud.
CHAPTERS:
0:00 - Intro & Introducing Our New AI Co-Host Moshi
1:39 - Trying to Break Moshi: The Illegal Cigarette Trade Test
2:30 - OpenAI's Jony Ive Phone: Do We Need a Device?
5:07 - Telegram Agents & GPT Real-Time Voice 2.0 Dream
7:38 - The Supervisory Agent: Managing Your Agentic Workflow
9:05 - Wait... Are We Accidentally Validating the OpenAI Phone?
11:37 - GPT-5.5 First Impressions: Actually Really Good
14:36 - 5.5 vs Opus 4.6: Different Strengths
17:00 - Opus 4.7: The First-Ever Anthropic Regression
20:25 - Grok 4.3: Infinite Emojis & Absolute Chaos
21:22 - 🎵 DISS TRACK: "Watch My Spud"
24:24 - Grok Specs & All Models Deprecated in 18 Days
27:04 - Grok Voice in Tesla Is Actually Next Level
31:03 - Token Pricing: The Subscription Problem Nobody Can Solve
39:16 - AI Disruption Cycles & The State of the Industry
44:39 - BONUS TRACK:🎵 "It's Hard Being Me"
Thanks for listening, like and sub xoxo
Comments
Want to join the conversation?
Loading comments...