Google’s New AI Training Method Helps Small Models Tackle Complex Reasoning

•November 14, 2025

VentureBeat AI•Nov 14, 2025

Companies Mentioned

Google

GOOG

DeepSeek

Alibaba Group

BABA

Why It Matters

SRL delivers high‑level reasoning to modestly sized models without extra inference cost, expanding the pool of deployable AI agents for enterprise automation and high‑stakes applications. This reduces dependence on costly large models and accelerates AI adoption in domains such as data‑science automation, software development, and supply‑chain optimization.

Summary

Researchers from Google Cloud and UCLA unveiled Supervised Reinforcement Learning (SRL), a framework that reformulates problem solving as a sequence of intermediate actions and provides step‑wise rewards. SRL enables 7‑billion‑parameter models such as Qwen2.5‑7B‑Instruct to gain a 3 % lift on elite math benchmarks and a 14.8 % task‑resolve rate—74 % higher than supervised fine‑tuning—on software‑engineering challenges, while keeping token usage comparable to the base model. The approach also serves as a curriculum, boosting subsequent outcome‑based RL (RLVR) fine‑tuning by 3.7 % on average. Overall, SRL narrows the performance gap for smaller, cheaper models on complex multi‑step reasoning tasks.

Google’s New AI Training Method Helps Small Models Tackle Complex Reasoning

Companies Mentioned

Why It Matters

Summary

Ask Pulse AI:

Comments

AI Pulse