Thinking Machines Wants to Build an AI that Actually Listens While It Talks

•May 12, 2026

TechCrunch (Main)•May 12, 2026

Companies Mentioned

Thinking Machines

OpenAI

Google

GOOG

Why It Matters

Full‑duplex AI could dramatically reduce conversational latency, reshaping user experiences across voice assistants and real‑time applications. Early access will let developers test whether faster, overlapping interaction improves productivity and engagement.

Key Takeaways

•TML‑Interaction‑Small replies in 0.40 seconds, full‑duplex mode
•Model processes input and generates output simultaneously, like a phone call
•Research preview slated for limited release in coming months
•Potential to reshape conversational AI latency and user experience

Pulse Analysis

Traditional conversational agents operate in a half‑duplex fashion: they wait for the user to finish speaking before generating a reply. That turn‑taking pattern, inherited from text‑based chat, introduces latency that feels unnatural in spoken dialogue. Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, is betting on a full‑duplex architecture that can listen and speak at the same time, effectively turning a chatbot into a true conversational partner. If successful, this shift could bring AI interactions closer to the flow of a phone call.

The company’s first prototype, dubbed TML‑Interaction‑Small, claims a response latency of just 0.40 seconds, a figure that rivals the speed of human turn‑taking in everyday conversation. By processing the incoming audio stream while simultaneously generating output, the model sidesteps the buffering delays that plague OpenAI’s GPT‑4o and Google’s Gemini models, which typically pause for 0.8‑1.2 seconds before replying. Thinking Machines says a limited research preview will be available to select partners in the next few months, with a broader rollout planned for later this year.

Full‑duplex conversational AI opens new use cases such as real‑time translation, interactive tutoring, and hands‑free virtual assistants that can interject without waiting for a pause. However, the approach also raises challenges in maintaining coherence when the model receives overlapping speech and in preventing unintended interruptions. Industry observers will watch how developers integrate this capability into existing platforms and whether the latency advantage translates into measurable productivity gains. If Thinking Machines can deliver a stable, scalable product, it could set a new benchmark for responsiveness across the AI market.

Thinking Machines Wants to Build an AI that Actually Listens While It Talks

Companies Mentioned

Why It Matters

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Comments

AI Pulse