
Gradient Dissent
Base10’s rise illustrates how the AI inference market has evolved from a niche model‑serving service into a critical, high‑throughput production layer. Early on the company built generic deployment tools for small internal models, but the explosion of large language and diffusion models forced a re‑examination of scalability, SLA guarantees, and GPU orchestration. This shift mirrors broader industry trends where inference—running trained models at scale—has become the primary revenue engine, demanding robust infrastructure that can handle thousands of GPUs while maintaining low latency.
The turning point for Base10 was less a dramatic pivot than a strategic refocus on emerging demand. The launch of ChatGPT set new user‑experience standards, and the open‑source breakthrough of Stable Diffusion demonstrated that developers expect high‑quality, production‑ready APIs. Early adopters like Patreon, experimenting with Whisper for subtitles, and the Refusion project, which needed 100‑150 A10 GPUs for music generation, validated the market’s appetite for reliable, cost‑effective inference. Base10 leveraged its pre‑existing serving stack, quickly expanding its surface area to support massive models, and secured a Series B round that propelled ARR into the eight‑figure range. These developments underscore how rapid product iteration and alignment with ecosystem expectations can unlock exponential growth in a seemingly commoditized space.
Beyond technology, Tuhin Srivastava emphasizes perseverance, timing, and a tightly knit team culture as core entrepreneurial drivers. Maintaining operational discipline—focused customer support, lean meeting rituals, and a clear boundary between performance metrics and performative processes—has allowed Base10 to scale to over 100 employees without sacrificing the startup ethos. For business leaders, the lesson is clear: stay adaptable to market signals, invest in infrastructure that can scale with model size, and nurture a culture that balances rigor with flexibility. As AI inference continues to dominate the value chain, companies that master these dynamics will shape the next wave of AI‑powered products.
In this episode of Gradient Dissent, Lukas Biewald talks with Tuhin Srivastava, CEO and founder of Baseten, one of the fastest-growing companies in the AI inference ecosystem. Tuhin shares the real story behind Baseten’s rise and how the market finally aligned with the infrastructure they’d spent years building.
They get into the core challenges of modern inference, including why dedicated deployments matter, how runtime and infrastructure bottlenecks stack up, and what makes serving large models fundamentally different from smaller ones.
Tuhin also explains how vLLM, TensorRT-LLM, and SGLang differ in practice, what it takes to tune workloads for new chips like the B200, and why reliability becomes harder as systems scale.
The conversation dives into company-building, from killing product lines to avoiding premature scaling while navigating a market that shifts every few weeks.
Connect with us here:
Tuhin Srivastva: https://www.linkedin.com/in/tuhin-srivastava/
Lukas Biewald: https://www.linkedin.com/in/lbiewald/
Weights & Biases: https://www.linkedin.com/company/wandb/
Comments
Want to join the conversation?
Loading comments...