Production Sub-Agents for LLM Post Training

MLOps Community
MLOps CommunityApr 10, 2026

Why It Matters

Accelerating LLM post‑training from weeks to days cuts costs and unlocks rapid deployment of reliable, agentic AI products, giving firms a competitive edge.

Key Takeaways

  • Sub‑agent architecture cuts LLM post‑training cycle from weeks to ~one week.
  • Swarm agents cause context bloat and scaling bottlenecks; sub‑agents avoid them.
  • Structured skills.md and two‑calling 2.0 reduce token usage by up to 50%.
  • Custom memory API with pruning/compression improves long‑horizon training stability.
  • Open‑source tools like Miniax 2.5 and Agent SDK lower cost and orchestration complexity.

Summary

The talk introduced a new production workflow for post‑training large language models, championed by Pinterest’s growth AI lead. Traditional pipelines required a linear, manual sequence—data cleaning, model selection, hyper‑parameter tuning, evaluation loops, and reinforcement learning—taking four to six weeks. By leveraging Claude’s sub‑agent framework and a custom agent SDK, the team compressed this timeline to roughly one week.

Key insights include the contrast between sub‑agents and Claude’s swarm mode. Swarm agents quickly hit context‑window limits and suffer from the “hot celebrity” scaling issue, whereas sub‑agents communicate solely through a central orchestrator, keeping context manageable. The workflow also replaces ad‑hoc natural‑language instructions with a structured skills.md file and introduces “two‑calling 2.0,” allowing agents to generate and execute their own code, slashing token consumption by up to 50%.

Concrete examples highlighted cost‑effective alternatives like Miniax 2.5 (≈0.27¢/token) and a bespoke memory API that prunes and compresses episodic logs, mitigating memory collapse after many epochs. The speaker cited a 70‑50% reduction in token usage and emphasized the importance of custom reward models for memory writes.

The implications are clear: faster iteration cycles, reduced cloud spend, and more reliable model behavior enable enterprises to deploy agentic AI products—such as CRM assistants or threat‑hunting tools—at production scale. Ongoing research into learned memory architectures and hybrid retrieval promises further stability for long‑horizon AI development.

Original Description

Faye Zhang (Pinterest) Lightning Talk at the Coding Agents Conference at the Computer History Museum, March 3rd, 2026.
Abstract //
Training models used to take weeks. Faye Zhang cut it to days with sub-agents, but the catch is brutal: more agents mean more chaos, memory issues, drift, and broken workflows, so the real game isn’t faster training, it’s controlling the mess you just created.
Bio //
Faye Zhang is a staff AI engineer and tech lead at Pinterest, where she leads Multimodal AI work for search traffic discovery and shopping driving platform growth globally.

Comments

Want to join the conversation?

Loading comments...