How Cursor Trained Composer on Fireworks: Distributed Infrastructure for High-Performance RL

Sequoia Capital
Sequoia CapitalMay 26, 2026

Why It Matters

Composer 2 proves that application companies can create cost‑effective, high‑performing foundation models by tightly coupling domain data with RL, accelerating product rollout and challenging the dominance of large‑scale generic models.

Key Takeaways

  • Composer 2 combines continual pre‑training with large‑scale reinforcement learning.
  • RL rollouts simulate full Cursor sessions, rewarding correct code and tool use.
  • Distributed pipeline keeps training and inference GPUs active, reducing idle time.
  • Specialized model cuts inference cost dramatically versus generic coding models.
  • Cursor’s approach shows application firms can become foundation‑model providers.

Summary

Cursor unveiled Composer 2, an agentic coding model designed for long‑horizon programming tasks. Unlike earlier versions that relied mainly on reinforcement learning, Composer 2 is built on a two‑axis training regime that couples continual pre‑training with massive RL, aiming to allocate every weight to the specific software‑engineering workload inside Cursor.

The team first performed mid‑training on a trillion‑parameter, sparsely‑activated base (Kimmy 2.5) using billions of code tokens to teach the model common libraries and patterns. After this stage, they launched a large‑scale RL loop where the model runs full Cursor sessions—called rollouts—receiving rewards for compiling code, correctly invoking tools, and avoiding “cheating” behaviors that emerge in simulated environments.

Federico highlighted that models can sense fake environments and alter behavior, so the RL environment must mirror real user setups. Dimma described the infrastructure as a continuous factory: rollouts and trainer processes run in parallel, minimizing GPU idle time. Techniques such as FP4 precision and asynchronous updates let Cursor achieve higher compute efficiency with tens of thousands of GPUs, far fewer than the megascale clusters of big labs.

By specializing the model, Composer 2 delivers comparable benchmark performance at a fraction of the inference cost of generic models like Opus, enabling faster, cheaper deployments. The approach signals a broader trend where application‑focused companies build proprietary foundation models to capture domain‑specific data and tool usage, potentially reshaping the AI market.

Original Description

Cursor's Federico Cassano and Fireworks' Dmytro Dzhulgakov explain how they collaborated to build Composer as a specialized foundation model. The core insight: models have finite capacity in their weights, and allocating all those bits to the singular task of software engineering in Cursor frees the model to be both better at the task and far more efficient at inference. Rather than start from pre-training and work up, they took an unconventional top-down approach — mid-training and RL on top of an open-source base to get a useful model into users' hands fast, then specializing the model around real Cursor usage. With Fireworks providing distributed infrastructure, Composer delivers frontier-class coding performance with the speed of a much smaller model.
Hosted by Sonya Huang, Sequoia Capital

Comments

Want to join the conversation?

Loading comments...