AI Podcasts
  • All Technology
  • AI
  • Autonomy
  • B2B Growth
  • Big Data
  • BioTech
  • ClimateTech
  • Consumer Tech
  • Crypto
  • Cybersecurity
  • DevOps
  • Digital Marketing
  • Ecommerce
  • EdTech
  • Enterprise
  • FinTech
  • GovTech
  • Hardware
  • HealthTech
  • HRTech
  • LegalTech
  • Nanotech
  • PropTech
  • Quantum
  • Robotics
  • SaaS
  • SpaceTech
AllNewsDealsSocialBlogsVideosPodcastsDigests

AI Pulse

EMAIL DIGESTS

Daily

Every morning

Weekly

Sunday recap

NewsDealsSocialBlogsVideosPodcasts
AIPodcastsWhy RL Won — Kyle Corbitt, OpenPipe (Acq. CoreWeave)
Why RL Won — Kyle Corbitt, OpenPipe (Acq. CoreWeave)
AI

Latent Space

Why RL Won — Kyle Corbitt, OpenPipe (Acq. CoreWeave)

Latent Space
•October 16, 2025•0 min
0
Latent Space•Oct 16, 2025

Key Takeaways

  • •OpenPipe built cheap GPT‑4 distillation workflow, hit $1M ARR quickly.
  • •Model cost drops eroded value, driving shift to RL, LoRA.
  • •LoRA provides flexible, low‑cost fine‑tuning versus full model training.
  • •RL agents like email and code became core product focus.
  • •CoreWeave acquisition confirms demand for fine‑tuning infrastructure.

Pulse Analysis

OpenPipe launched in early 2023 to address the prohibitive cost of GPT‑4 in production. By offering a managed distillation pipeline that captured API traffic and produced smaller, cheaper models, the startup secured three enterprise customers within a month and scaled to $1 million ARR in under a year. This rapid traction highlighted a clear market gap: enterprises needed high‑quality language models without the astronomical OpenAI fees.

As frontier model pricing collapsed and open‑source alternatives improved, OpenPipe’s original value proposition weakened. The team pivoted toward LoRA‑based fine‑tuning, which delivers comparable performance with far lower compute and memory requirements, and began exploring reinforcement‑learning (RL) agents for task‑specific automation. Projects like an email‑handling agent and code‑generation models demonstrated that RL could unlock new use‑cases beyond static inference, especially after the release of O1 models. The strategic shift attracted CoreWeave, leading to an acquisition that validates the growing demand for specialized fine‑tuning infrastructure within the AI ecosystem.

For AI founders, the episode underscores three practical lessons: fine‑tune only when cost, latency, or quality mandates it; prioritize LoRA for flexible, low‑overhead customization; and view RL as a long‑term differentiator once foundational models stabilize. As model prices continue to fall and open‑source offerings mature, startups that can streamline fine‑tuning workflows and integrate RL agents will be well‑positioned for acquisition or scaling in a consolidating market.

Episode Description

In this deep dive with Kyle Corbitt, co-founder and CEO of OpenPipe (recently acquired by CoreWeave), we explore the evolution of fine-tuning in the age of AI agents and the critical shift from supervised fine-tuning to reinforcement learning. Kyle shares his journey from leading YC's Startup School to building OpenPipe, initially focused on distilling expensive GPT-4 workflows into smaller, cheaper models before pivoting to RL-based agent training as frontier model prices plummeted. The conversation reveals why 90% of AI projects remain stuck in proof-of-concept purgatory - not due to capability limitations, but reliability issues that Kyle believes can be solved through continuous learning from real-world experience. He discusses the breakthrough of RULER (Relative Universal Reinforcement Learning Elicited Rewards), which uses LLMs as judges to rank agent behaviors relatively rather than absolutely, making RL training accessible without complex reward engineering. Kyle candidly assesses the challenges of building realistic training environments for agents, explaining why GRPO (despite its advantages) may be a dead end due to its requirement for perfectly reproducible parallel rollouts. He shares insights on why LoRAs remain underrated for production deployments, why GEPA and prompt optimization haven't lived up to the hype in his testing, and why the hardest part of deploying agents isn't the AI - it's sandboxing real-world systems with all their bugs and edge cases intact. The discussion also covers OpenPipe's acquisition by CoreWeave, the launch of their serverless reinforcement learning platform, and Kyle's vision for a future where every deployed agent continuously learns from production experience. He predicts that solving the reliability problem through continuous RL could unlock 10x more AI inference demand from projects currently stuck in development, fundamentally changing how we think about agent deployment and maintenance.

Key Topics:

The rise and fall of fine-tuning as a business model

Why 90% of AI projects never reach production

RULER: Making RL accessible through relative ranking

The environment problem: Why sandboxing is harder than training

GRPO vs PPO and the future of RL algorithms

LoRAs: The underrated deployment optimization

Why GEPA and prompt optimization disappointed in practice

Building world models as synthetic training environments

The $500B Stargate bet and OpenAI's potential crypto play

Continuous learning as the path to reliable agents

References

https://www.linkedin.com/in/kcorbitt/

Aug 2023  https://openpipe.ai/blog/from-prompts-to-models 

DEC 2023 https://openpipe.ai/blog/mistral-7b-fine-tune-optimized

JAN 2024 https://openpipe.ai/blog/s-lora

MAY 2024 https://openpipe.ai/blog/the-ten-commandments-of-fine-tuning-in-prod  

https://www.youtube.com/watch?v=-hYqt8M9u_M

Oct 2024 https://openpipe.ai/blog/announcing-dpo-support 

AIE NYC 2025 Finetuning 500m agents https://www.youtube.com/watch?v=zM9RYqCcioM&t=919s

AIEWF 2025 How to train your agent (ART-E) https://www.youtube.com/watch?v=gEDl9C8s_-4&t=216s

SEPT 2025 ACQUISTION https://openpipe.ai/blog/openpipe-coreweave 

W&B Serverless RL https://openpipe.ai/blog/serverless-rl?refresh=1760042248153

Show Notes

0

Comments

Want to join the conversation?

Loading comments...