AI Videos

All News Deals Social Blogs Videos Podcasts Digests

We’re Introducing Three Audio Models in the API

•May 7, 2026

OpenAI

OpenAI•May 7, 2026

Why It Matters

By delivering live, multilingual translation and reasoning‑enabled voice agents, OpenAI’s audio models let businesses create seamless, voice‑first experiences that automate tasks and reach global audiences instantly.

Key Takeaways

•OpenAI adds real‑time audio models to its API.
•GPT Realtime Translate supports live translation in 70 languages.
•GPT Realtime 2 enables voice agents with reasoning and tool calling.
•Models can switch languages mid‑sentence and handle technical terminology.
•Agents stay conversational while performing background actions and updates.

Summary

OpenAI announced the rollout of three new real‑time audio models to its API, showcasing GPT Realtime Translate and GPT Realtime 2 in a live demo. The translation model streams spoken input and outputs a natural‑sounding target language, handling up to 70 languages and even switching mid‑sentence between French, English, and German while preserving technical terms.

Key capabilities include instantaneous translation without post‑processing, and a voice‑agent model that can reason, call external tools, and provide preambles to explain its actions. The demo illustrated calendar look‑ups, CRM updates, and continuous listening that pauses only on command, highlighting parallel tool calling and transparent acknowledgments during processing.

Notable moments featured the presenter speaking French while the model rendered English audio, an interruption in German that the system handled seamlessly, and a voice assistant fetching meeting details and updating a CRM record—all while maintaining a natural conversational flow.

These models promise to dissolve language barriers, enable voice‑first applications across media, customer support, and education, and allow developers to embed real‑time reasoning and automation directly into products, positioning voice as a primary user interface.

Original Description

We’re introducing three audio models in the API that unlock a new class of voice apps for developers. With these models, developers can build voice experiences that feel more natural, respond more intelligently, and take action in real time:

• GPT‑Realtime‑2, our first voice model with GPT‑5‑class reasoning that can handle harder requests and carry the conversation forward naturally.

• GPT‑Realtime‑Translate, a new live translation model that translates speech from 70+ input languages into 13 output languages while keeping pace with the speaker.

• GPT‑Realtime‑Whisper, a new streaming speech-to-text that transcribes speech live as the speaker talks.

Comments

Want to join the conversation?

Loading comments...