AI Dev 26 X SF | Ashwyn Sharma: Every App Needs a Voice UI. Here's How to Build It

Andrew Ng
Andrew NgMay 21, 2026

Why It Matters

Vocal Bridge dramatically lowers the barrier to adding voice, enabling faster, more scalable multimodal products and making voice the default interface for future applications.

Key Takeaways

  • Vocal Bridge offers a fully managed voice AI platform.
  • Three integration surfaces: apps, AI agents, voice-as-tool.
  • SDK provides bidirectional hooks for seamless UI interaction.
  • Voice agent can delegate queries, preserving LLM context window.
  • Enables multimodal workflows like calls and brainstorming with minimal code.

Summary

Ashwin Sharma, CEO of Vocal Bridge, unveiled a platform that turns any application or AI agent into a voice‑first experience. The company positions itself as a one‑stop, fully managed solution, offering three distinct integration surfaces: embedding voice directly into existing or new apps, adding spoken interaction to text‑based AI agents, and using voice as a tool for multimodal tasks such as brainstorming or outbound calls. The core of Vocal Bridge is a React SDK that supplies two hooks—onAction and sendAction—enabling bidirectional communication between the voice agent and the host UI. Developers define client actions in a simple JSON schema, allowing the agent to trigger UI events (e.g., placing a tic‑tac‑toe mark) and receive user actions back for context‑aware responses. A command‑line interface further streamlines configuration, token management, and tool integration without writing extensive backend code. Live demos illustrated the platform’s capabilities: a voice‑controlled tic‑tac‑toe game demonstrated real‑time state synchronization; a Claude‑backed chatbot showed how a single line of code can give a text‑only LLM a natural voice; and a brainstorming session highlighted the agent’s ability to switch modalities, schedule talks, and even place phone calls using a predefined schema. Throughout, the voice agent intelligently delegated queries to the underlying LLM only when needed, preserving the model’s context window. By abstracting the complex stack of speech‑to‑text, voice activity detection, endpointing, and turn‑taking, Vocal Bridge promises to cut development cycles from months to days. This accelerates the adoption of voice interfaces across web, mobile, and enterprise software, positioning voice as the next universal interaction layer and opening new revenue streams for developers and product teams.

Original Description

Voice AI today is mostly customer service bots. That's about to change — and AI devs will build what comes next. This talk by Vocal Bridge's Ashwyn Sharma introduces Voice UI as an emerging interface category, explains the technical architecture that makes truly multimodal voice experiences possible, and shows you exactly how to build one live on stage.

Comments

Want to join the conversation?

Loading comments...