How RecursiveMAS Speeds up Multi-Agent Inference by 2.4x and Reduces Token Usage by 75%
Why It Matters
RecursiveMAS removes the latency and expense of token‑by‑token communication, unlocking scalable, cost‑effective multi‑agent AI for real‑world applications.
Key Takeaways
- •RecursiveMAS speeds inference 1.2‑2.4× over text‑based agents.
- •Token usage drops up to 75% by third recursion round.
- •Only 13 M parameters (0.31%) trainable, halving GPU cost.
- •Improves benchmark accuracy by 8.3% versus strongest baselines.
- •Enables multi‑agent loops without loading duplicate model copies.
Pulse Analysis
Multi‑agent AI promises to tackle tasks that single models struggle with, but most systems still rely on sequential text generation to share intermediate reasoning. This text‑centric approach creates a cascade of latency, inflates token counts, and makes end‑to‑end training prohibitively expensive, especially when dozens of models must wait on each other's outputs. RecursiveMAS flips the paradigm by moving the dialogue into the latent embedding space, allowing agents to pass high‑dimensional representations directly, akin to a neural “telepathy” that eliminates unnecessary decoding steps.
The core innovation lies in the RecursiveLink modules—tiny two‑layer adapters that translate hidden states between agents and recycle them within each model’s reasoning loop. Because the underlying language models remain frozen, only the RecursiveLinks—about 13 million parameters, roughly 0.31% of the total—are updated during training. This dramatically reduces GPU memory pressure and slashes training budgets by more than 50% compared with full fine‑tuning or LoRA approaches. Moreover, the token economy improves sharply: the framework cuts token usage by 34.6% in the first recursion round and by over 75% by the third, while delivering 1.2‑2.4× faster inference.
Empirical results across nine diverse benchmarks—including code generation, medical reasoning, and search‑based QA—show RecursiveMAS delivering an average 8.3% accuracy uplift over the strongest baselines, with standout gains of 18% on complex math challenges. These efficiency and performance gains make the technology attractive for enterprises seeking to deploy sophisticated agentic pipelines without prohibitive compute costs. The team has open‑sourced the code and pretrained weights under an Apache 2.0 license, positioning RecursiveMAS as a practical blueprint for scalable, cost‑effective multi‑agent AI in production environments.
How RecursiveMAS speeds up multi-agent inference by 2.4x and reduces token usage by 75%
Comments
Want to join the conversation?
Loading comments...