The Future of Voice AI Is Here: Real-Time Cloning, On-Device & Live Translation (Gradium CEO)

Data Driven NYC
Data Driven NYCApr 15, 2026

Why It Matters

By democratizing real‑time, on‑device voice AI, Gradium lowers barriers for interactive media, accessibility, and privacy‑focused applications, reshaping how businesses engage users through personalized speech.

Key Takeaways

  • Gradium offers real‑time voice AI models for transcription, synthesis, translation.
  • New on‑device CPU TTS runs high‑fidelity voice cloning locally.
  • Personalized voice cloning enables dynamic game commentary and ALS voice restoration.
  • Open‑source Gradbot framework powers real‑time voice agents with tool integration.
  • Hibiki Zero provides offline multilingual speech translation with voice cloning.

Summary

Gradium’s CEO outlined the company’s mission to power real‑time voice applications through a technology‑first approach, delivering speech‑to‑text, text‑to‑speech, and translation models that run at scale. The spin‑off from the nonprofit QI Lab builds production‑ready infrastructure rather than vertical‑specific products, leveraging expertise from DeepMind, Meta, and quantitative finance.

Key insights include a shift from offline audio generation—such as audiobooks—to interactive, personalized experiences that must be fast and inexpensive. Gradium’s voice‑cloning benchmark outperforms Eleven Labs, capturing nuanced tones from just ten seconds of source audio. Demonstrations ranged from dynamic esports commentary in a mobile game to restoring the voice of an ALS patient using archival recordings.

Notable examples featured a CPU‑based 100‑million‑parameter TTS model delivering high‑fidelity cloning on edge devices, the open‑source Gradbot framework enabling voice‑driven agents with tool‑calling, and Hibiki Zero’s offline multilingual speech translation running on an iPhone without internet. Live demos showcased real‑time transcription, streaming TTS, and seamless voice‑based travel assistance.

The implications are broad: developers can embed low‑latency, cost‑effective voice features into games, live streams, robotics, and accessibility tools without incurring API fees. On‑device processing enhances privacy and opens markets where connectivity is limited, while open‑source releases accelerate ecosystem adoption and innovation.

Original Description

Current voice AI is too slow and expensive for interactive applications like gaming and robotics. Enter Gradium, a commercial spin-off from the Kyutai AI lab. In this demo, Neil Zeghidour showcases their real-time voice infrastructure. Watch their killer features in action: a high-fidelity text-to-speech model running entirely on a CPU, interactive voice agents that maintain natural conversation flow, and real-time speech translation with voice cloning. They even demonstrate restoring the voice of ALS patient Olivier Goy
00:31 - The backstory: A commercial spin-off from Kyutai Labs.
01:16 - The shift from offline to interactive voice in gaming and live streams.
03:12 - Live demo: AI-generated personalized esports commentary.
04:33 - Restoring the voice of ALS patient Olivier Goy.
05:16 - Creating real-time personalized videos.
06:41 - Running a 100M parameter text-to-speech model locally on a CPU.
08:41 - Building interactive voice agents that use function calling.
11:47 - Hibiki: Real-time, on-device speech-to-speech translation.
Gradium
Website - @
X/Twitter - @AI
HOSTED BY:
FirstMark Capital
Website - @
X/Twitter - @rkCap
Matt Turck (Managing Director)
Blog - @
LinkedIn - @k/
X/Twitter - @ck
This session was recorded live at a recent Data Driven NYC, our in-person, monthly event series. If you are ever in New York, you can join the upcoming events by following FirstMark on Luma: @rkcap
Check out the MAD Podcast:
Spotify - @LATDSaFvgJG80ACcRJtq

Comments

Want to join the conversation?

Loading comments...