Krisp Launches VIVA 2.0, Introducing Voice Infrastructure for Voice AI Agents

•May 6, 2026

AiThority•May 6, 2026

Companies Mentioned

LiveKit

Ultralytics

Why It Matters

By cleaning and interpreting raw audio before speech‑to‑text, VIVA dramatically improves downstream transcription and response quality, enabling scalable, reliable voice‑AI deployments in noisy, real‑world environments.

Key Takeaways

•Turn Prediction v3 predicts end‑of‑turn without transcription
•Interrupt Prediction classifies user intent to interject in real time
•Signal detectors identify synthetic speech, accent, and gender instantly
•Voice Isolation v3 reduces word error rate across pipelines
•Over 12 billion minutes processed; 130+ products integrated

Pulse Analysis

Voice‑AI adoption has exploded, with usage projected to grow ninefold in 2025. Yet most agents still stumble when confronted with background chatter, echo, or overlapping speech, inflating word‑error rates from a tidy 5% to over 30%. Traditional pipelines rely on downstream speech‑to‑text, large language models and text‑to‑speech, leaving a critical gap: the raw audio signal. Without a dedicated front‑end to filter, predict turn boundaries and recognize acoustic cues, even the most sophisticated models can misinterpret user intent, leading to dropped calls and frustrated customers.

Krisp’s VIVA 2.0 fills that gap with a suite of lightweight, CPU‑only models that operate on audio alone. Turn Prediction v3 anticipates when a speaker has finished, eliminating premature interruptions, while Interrupt Prediction detects a user’s intent to take the floor, distinguishing genuine interjections from back‑channel acknowledgments. New signal detectors flag synthetic speech, identify accents, and infer gender, allowing downstream STT engines to select the most appropriate language model. The upgraded Voice Isolation v3 further cleans the signal, delivering measurable reductions in downstream word‑error rates. All components are bundled into existing VIVA pricing, removing cost barriers for enterprises.

The market response has been swift. Over 12 billion minutes of voice‑AI traffic flow through VIVA each year, powering more than 130 platforms ranging from contact‑center solutions to conversational toys. Reported outcomes include a 3.5× improvement in turn‑taking accuracy, a 50% drop in call failures, and a 30% rise in customer satisfaction scores. As voice becomes the primary interface for human‑AI interaction, infrastructure layers like VIVA will be essential for delivering reliable, natural conversations at scale, positioning Krisp as a critical enabler in the emerging voice‑AI ecosystem.

Krisp Launches VIVA 2.0, Introducing Voice Infrastructure for Voice AI Agents

Companies Mentioned

Why It Matters

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Comments

AI Pulse