Build Hour: Agent RFT
Why It Matters
Agent RFT lets businesses fine‑tune autonomous AI agents to use their own tools efficiently, cutting latency and inference costs while delivering higher task accuracy—key advantages for scaling AI‑driven workflows.
Summary
In a recent "Build Hour" webcast, OpenAI’s startup marketing lead Christine, alongside engineer Will and solutions architect Theo, introduced Agent Reinforcement Fine‑Tuning (Agent RFT), a new capability that lets developers fine‑tune autonomous agents by rewarding desired tool‑use behavior during training. The session built on earlier tutorials about agents, the responses API and Agent Kit, and positioned RFT as the next logical step for teams that have already optimized prompts, task definitions and tool descriptions but still need higher performance.
The presenters explained that agents differ from standard language models by interacting with external tools—code interpreters, databases, browsers, etc.—and that every tool call is fed back into the model’s context window. While prompt engineering and task simplification can yield modest gains, Agent RFT modifies the model’s weights using a custom reward signal, allowing the agent to explore many tool‑calling strategies and converge on the most efficient ones. The product now supports live tool calls during training, a reward‑endpoint API, and a lightweight penalty on reasoning tokens to curb unnecessary calls, delivering sample‑efficient learning and lower latency.
A concrete example featured a partnership with Cognition and a demo on a hardened financial‑QA benchmark. The team gave the agent access to a semantic search tool, a directory‑listing tool, and a "cat" tool to retrieve documents from a corpus of 2,800 reports, imposing a ten‑tool‑call budget. Using a model grader that awards partial credit for near‑correct answers, the fine‑tuned agent learned to locate the right report, extract numeric data, and answer within the call limit, illustrating how RFT can improve both accuracy and speed. Theo highlighted the ability to tag each tool call with a rollout ID, enabling customers to track state and apply bespoke grading logic in their own environments.
The rollout signals that enterprises can now train agents that are tightly aligned with proprietary toolsets and latency constraints, reducing inference costs while boosting task‑specific performance. By exposing the training loop to real‑world endpoints, OpenAI gives developers the flexibility to define business‑critical reward functions, paving the way for more reliable, production‑ready autonomous agents across sectors such as code assistance, customer service and financial analysis.
Comments
Want to join the conversation?
Loading comments...