Build Hour: GPT-Realtime-2

OpenAI
OpenAIMay 13, 2026

Why It Matters

GPT‑Realtime‑2 gives businesses a production‑ready, low‑latency voice AI that can orchestrate complex tasks, unlocking hands‑free experiences and faster analytics across industries.

Key Takeaways

  • GPT‑Realtime‑2 adds 128k token context window for longer conversations.
  • Parallel tool calling enables seamless multi‑step voice workflows in real‑time.
  • Supports 70+ input languages, 13 output, 200 ms latency.
  • Dynamic voice cloning and tone control improve user experience.
  • Live demos showcase voice‑powered e‑commerce and analytics assistants.

Summary

OpenAI’s Build Hour introduced GPT‑Realtime‑2, the latest voice‑centric model suite that expands on the earlier real‑time translation and Whisper APIs. The session highlighted three new models: a streaming translation engine covering 70 input languages, a Whisper‑based speech‑to‑text model with latency as low as 200 ms, and GPT‑Realtime‑2, which delivers GPT‑5‑level reasoning in voice, 128k token context windows, parallel tool calling, dynamic voice cloning, and controllable expressiveness.

Key technical advances include a four‑fold increase in context length, enabling near‑hour‑long conversations without truncation, and the ability to invoke multiple tools simultaneously rather than sequentially. The model also understands domain‑specific vocabularies, offers tone‑matching and pre‑ambles for more natural dialogue, and maintains low‑latency streaming across 13 output languages. These capabilities were demonstrated in two live scenarios.

In the e‑commerce demo, the assistant queried product inventories, filtered by price and size, fetched weather forecasts, and added items to a cart—all through voice commands, showcasing real‑time tool orchestration and UI manipulation. A second demo had the model act as a product‑analytics analyst, filtering dashboards, identifying a Safari‑specific regression, and summarizing findings for engineering tickets. Participants noted the model’s ability to stay silent unless prompted, preserving developer control while delivering AI‑driven insights.

The release signals a shift toward voice‑first interfaces that can replace or augment traditional click‑based workflows. Enterprises can now embed hands‑free assistants in customer‑service, smart‑device control, gaming, and internal analytics, potentially reducing friction, accelerating decision‑making, and opening new revenue streams for AI‑enhanced products.

Original Description

Build with the next wave of realtime voice AI. In this Build Hour, you’ll learn how to use GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper to build low-latency voice agents that can translate live speech, reason across tools, operate apps, and support more natural voice-to-voice and voice-to-action experiences.
In this session, Teri Yu (Product) and Erika Kettleson (Solutions Engineering) will cover:
• Building with new realtime audio models for translation, streaming speech-to-text, and intelligent voice agents
• Using GPT-Realtime-2 capabilities like preambles, 128K context, parallel tool calling, domain understanding, context over turns, and controllable expressiveness
• Creating voice-powered workflows for shopping and product analytics dashboards
• Customer Spotlight on how Sierra (https://sierra.ai/) is designing production customer experience agents with guardrails, VAD tuning, tracing, redaction, evals, and customer-specific harnesses.
👉 Follow along with the code repo: http://github.com/openai/build-hours
👉 Sign up for upcoming live Build Hours: https://webinar.openai.com/buildhours
00:00 Welcome and intro
02:06 Realtime voice models overview
02:26 GPT-Realtime-Translate and GPT-Realtime-Whisper demo
04:36 GPT-Realtime-2: three ways to build with voice AI
05:14 What’s new in GPT-Realtime-2
06:58 Demo: Voice-powered search agent
12:32 Demo: Product analytics dashboard
17:24 What can you build with voice AI?
18:36 Customer spotlight: Sierra
29:56 Q&A
42:05 Resources & Upcoming Build Hours

Comments

Want to join the conversation?

Loading comments...