Google’s New AI Training Method Helps Small Models Tackle Complex Reasoning

Google’s New AI Training Method Helps Small Models Tackle Complex Reasoning

VentureBeat AI
VentureBeat AINov 14, 2025

Why It Matters

SRL delivers high‑level reasoning to modestly sized models without extra inference cost, expanding the pool of deployable AI agents for enterprise automation and high‑stakes applications. This reduces dependence on costly large models and accelerates AI adoption in domains such as data‑science automation, software development, and supply‑chain optimization.

Summary

Researchers from Google Cloud and UCLA unveiled Supervised Reinforcement Learning (SRL), a framework that reformulates problem solving as a sequence of intermediate actions and provides step‑wise rewards. SRL enables 7‑billion‑parameter models such as Qwen2.5‑7B‑Instruct to gain a 3 % lift on elite math benchmarks and a 14.8 % task‑resolve rate—74 % higher than supervised fine‑tuning—on software‑engineering challenges, while keeping token usage comparable to the base model. The approach also serves as a curriculum, boosting subsequent outcome‑based RL (RLVR) fine‑tuning by 3.7 % on average. Overall, SRL narrows the performance gap for smaller, cheaper models on complex multi‑step reasoning tasks.

Google’s new AI training method helps small models tackle complex reasoning

Comments

Want to join the conversation?

Loading comments...