AI News and Headlines
  • All Technology
  • AI
  • Autonomy
  • B2B Growth
  • Big Data
  • BioTech
  • ClimateTech
  • Consumer Tech
  • Crypto
  • Cybersecurity
  • DevOps
  • Digital Marketing
  • Ecommerce
  • EdTech
  • Enterprise
  • FinTech
  • GovTech
  • Hardware
  • HealthTech
  • HRTech
  • LegalTech
  • Nanotech
  • PropTech
  • Quantum
  • Robotics
  • SaaS
  • SpaceTech
AllNewsDealsSocialBlogsVideosPodcastsDigests

AI Pulse

EMAIL DIGESTS

Daily

Every morning

Weekly

Sunday recap

NewsDealsSocialBlogsVideosPodcasts
AINewsA New Google AI Research Proposes Deep-Thinking Ratio to Improve LLM Accuracy While Cutting Total Inference Costs by Half
A New Google AI Research Proposes Deep-Thinking Ratio to Improve LLM Accuracy While Cutting Total Inference Costs by Half
AI

A New Google AI Research Proposes Deep-Thinking Ratio to Improve LLM Accuracy While Cutting Total Inference Costs by Half

•February 22, 2026
0
MarkTechPost
MarkTechPost•Feb 22, 2026

Companies Mentioned

Google

Google

GOOG

DeepSeek

DeepSeek

Why It Matters

DTR provides a more reliable signal for LLM effectiveness, enabling cost‑efficient inference strategies that maintain or improve accuracy, a critical advantage for large‑scale AI deployments.

Key Takeaways

  • •Longer token sequences often reduce LLM accuracy.
  • •Deep‑Thinking Ratio correlates positively with model performance.
  • •Think@n cuts inference cost by roughly fifty percent.
  • •Early halting discards low DTR candidates after 50 tokens.
  • •Metric leverages internal layer dynamics, not just output length.

Pulse Analysis

Since the rise of chain‑of‑thought prompting, practitioners have equated longer reasoning traces with higher quality answers. Empirical studies, however, reveal a paradox: extending token sequences frequently leads to lower accuracy, a phenomenon the University of Virginia and Google attribute to ‘overthinking.’ Their new metric, the Deep‑Thinking Ratio (DTR), shifts the focus from surface length to the depth of internal computation. By quantifying how many tokens only stabilize in the final layers of a transformer, DTR offers a more faithful proxy for the cognitive effort a model expends on a problem.

The authors identify deep‑thinking tokens by projecting each layer’s hidden state onto the vocabulary space and measuring the Jensen‑Shannon divergence against the final‑layer distribution. Tokens whose predictions shift until the last 15 % of layers are flagged as deep‑thinking, and the proportion of such tokens defines the DTR. Across models ranging from DeepSeek‑R1‑70B to GPT‑OSS‑120B, DTR exhibits an average Pearson correlation of +0.68 with benchmark accuracy, starkly contrasting the –0.59 correlation observed for raw token count. This positive relationship holds for math, logic, and code‑generation tasks, confirming DTR as a robust performance indicator.

Building on DTR, the Think@n inference strategy halts low‑scoring candidates after a 50‑token prefix, allocating compute only to high‑DTR samples. On the AIME‑25 math benchmark, Think@n achieved 94.7 % accuracy while cutting average token consumption from 307.6 k to 155.4 k, a 49 % cost reduction versus traditional self‑consistency voting. For enterprises deploying large language models at scale, this translates into substantial savings on GPU hours and faster response times without sacrificing quality. The approach also opens avenues for dynamic inference pipelines that adaptively allocate resources based on internal model signals rather than arbitrary length thresholds.

A New Google AI Research Proposes Deep-Thinking Ratio to Improve LLM Accuracy While Cutting Total Inference Costs by Half

Read Original Article
0

Comments

Want to join the conversation?

Loading comments...