Researchers Automated LLM Reasoning Strategy Design and Cut Token Usage by 69.5%

Researchers Automated LLM Reasoning Strategy Design and Cut Token Usage by 69.5%

VentureBeat
VentureBeatMay 28, 2026

Companies Mentioned

Why It Matters

AutoTTS transforms costly, manual tuning of inference budgets into a fast, low‑cost automation, enabling businesses to lower AI operating expenses while extracting higher accuracy from existing models.

Key Takeaways

  • AutoTTS cuts token usage up to 69.5% without accuracy loss
  • Strategy discovered via offline replay costs $39.90 and 160 minutes
  • Confidence Momentum Controller uses EMA confidence and coupled width‑depth
  • Enterprise teams can auto‑tune inference budgets for proprietary LLMs

Pulse Analysis

Test‑time scaling (TTS) has become a go‑to technique for boosting large language model reasoning by allocating extra compute during inference. Until now, TTS policies were handcrafted, relying on engineers to set static thresholds for branching, pruning, and stopping, which left large swaths of the resource‑allocation space unexplored. Researchers from Meta, Google and academia answered this gap with AutoTTS, a framework that treats strategy design as an algorithmic search problem. By automating the discovery of optimal TTS controllers, AutoTTS promises enterprise‑grade efficiency without the trial‑and‑error overhead of manual tuning.

The core of AutoTTS is an explorer LLM—such as Claude Code—that iteratively writes and evaluates controller code inside an offline replay environment. Thousands of pre‑generated reasoning trajectories serve as a cheap sandbox, allowing the explorer to test candidate policies without invoking the base model for each run. One standout policy, the Confidence Momentum Controller, departs from naïve instant‑confidence checks: it tracks an exponential moving average of confidence, couples width and depth decisions, and allocates extra compute to branches that align with the emerging consensus. These nuanced rules emerge only because the agent is free from human intuition constraints.

Empirical results on Qwen‑3 models (0.6 B–8 B) and a distilled DeepSeek‑R1 model show AutoTTS slashing token consumption by up to 69.5 % while preserving—or even improving—accuracy on benchmarks such as AIME 24, AIME 25, HMMT 25 and GPQA‑Diamond. The discovery process itself cost merely $39.90 and 160 minutes, making custom TTS optimization accessible to teams without deep research budgets. With the code released on GitHub, enterprises can plug the Confidence Momentum Controller into existing pipelines, gaining both lower inference costs and higher peak performance for proprietary AI applications.

Researchers automated LLM reasoning strategy design and cut token usage by 69.5%

Comments

Want to join the conversation?

Loading comments...