We’re Introducing Three Audio Models in the API
Why It Matters
By delivering live, multilingual translation and reasoning‑enabled voice agents, OpenAI’s audio models let businesses create seamless, voice‑first experiences that automate tasks and reach global audiences instantly.
Key Takeaways
- •OpenAI adds real‑time audio models to its API.
- •GPT Realtime Translate supports live translation in 70 languages.
- •GPT Realtime 2 enables voice agents with reasoning and tool calling.
- •Models can switch languages mid‑sentence and handle technical terminology.
- •Agents stay conversational while performing background actions and updates.
Summary
OpenAI announced the rollout of three new real‑time audio models to its API, showcasing GPT Realtime Translate and GPT Realtime 2 in a live demo. The translation model streams spoken input and outputs a natural‑sounding target language, handling up to 70 languages and even switching mid‑sentence between French, English, and German while preserving technical terms.
Key capabilities include instantaneous translation without post‑processing, and a voice‑agent model that can reason, call external tools, and provide preambles to explain its actions. The demo illustrated calendar look‑ups, CRM updates, and continuous listening that pauses only on command, highlighting parallel tool calling and transparent acknowledgments during processing.
Notable moments featured the presenter speaking French while the model rendered English audio, an interruption in German that the system handled seamlessly, and a voice assistant fetching meeting details and updating a CRM record—all while maintaining a natural conversational flow.
These models promise to dissolve language barriers, enable voice‑first applications across media, customer support, and education, and allow developers to embed real‑time reasoning and automation directly into products, positioning voice as a primary user interface.
Comments
Want to join the conversation?
Loading comments...