This lowers the friction for repurposing written content into polished audio, enabling creators and businesses to produce multi-voice podcast episodes rapidly for distribution or accessibility. It illustrates how LLMs plus multispeaker TTS can automate content transformation and editing workflows at scale.
The developer built a web app that converts uploaded documents (PDFs, markdown, text) into multi-voice podcast episodes by using Gemini 3 to generate scripts and a multispeech TTS API to produce audio. The interface offers controls for tone (roast, steelman, explain like a fifth grader), length ranges, and voice selection, and displays a timeline with download/playback options. In demos the tool turned a credit-card statement into a two-voice comedic roast and summarized a robotics benchmark paper as a short, kid-friendly explainer. The project was scaffolded with a Python TTS backend and a Next.js-style frontend, using Gemini API docs and an agent-driven planning step to assemble features quickly.
Comments
Want to join the conversation?
Loading comments...