When Less Thinking Makes AI Smarter 🤯

•November 7, 2025

0

Louis Bouchard

Louis Bouchard•Nov 7, 2025

Why It Matters

Understanding when chain‑of‑thought harms accuracy helps practitioners avoid unnecessary prompting, leading to more reliable AI deployments and better alignment with human cognitive strengths.

Key Takeaways

•Chain-of-thought can lower accuracy on intuitive tasks
•Verbal overshading affects both humans and LLMs
•System 1 tasks favor rapid, pattern‑based responses
•Evaluate models with and without explicit reasoning
•Overthinking can turn intuitive moves into robotic actions

Pulse Analysis

Chain‑of‑thought prompting has become a go‑to technique for extracting logical reasoning from large language models. By asking a model to “think step‑by‑step,” developers have achieved impressive gains on arithmetic, commonsense, and multi‑hop reasoning benchmarks. However, the latest research from Tom Griffiths and colleagues reminds us that this strategy is not universally beneficial. Their experiments show that when a task is fundamentally intuitive—such as recognizing a familiar face or applying grammatical rules without explicit analysis—the extra verbalization interferes with the model’s pattern‑matching circuitry, leading to measurable drops in accuracy.

The phenomenon mirrors a well‑documented cognitive bias called verbal overshading, where describing a perceptual experience degrades memory or judgment. In the human literature, participants who verbalize what they see often perform worse on subsequent recognition tests. Griffiths’ work extends this effect to artificial neural networks, suggesting that LLMs share a similar reliance on fast, System 1 processing for certain inputs. By converting an internal representation into a textual chain, the model introduces noise that disrupts the compact embeddings that usually drive high‑confidence predictions.

For AI product teams, the takeaway is clear: one size does not fit all when it comes to prompting. Before defaulting to chain‑of‑thought, practitioners should benchmark both prompted and unprompted versions on the target dataset, especially for tasks rooted in visual perception, language fluency, or other instinctive domains. This dual‑evaluation approach can surface hidden performance cliffs and guide the design of hybrid pipelines that switch between intuitive and analytical modes. As research progresses, we can expect more nuanced prompting frameworks that dynamically assess whether a problem benefits from explicit reasoning or should be left to the model’s innate pattern‑recognition abilities.

Original Description

What if thinking harder actually makes LLMs worse? 🧠

A new paper by Tom Griffiths shows that both humans and AI models can perform worse when forced to reason step-by-step. In tasks like face recognition or grammar learning, “thinking out loud” (via chain-of-thought) actually reduces accuracy — a phenomenon known as verbal overshadowing.

Some problems are pure System 1: fast, intuitive, pattern-based. Forcing logic onto intuition is like overthinking a climbing move — suddenly, you move like a robot instead of just feeling it.

So next time you evaluate a model, don’t just ask how well it reasons. Try both: with and without reasoning. Sometimes, the best results come when the model — or the human — just feels it.

I’m Louis-François, PhD dropout, now CTO & co-founder at Towards AI. Follow me for tomorrow’s no-BS AI roundup 🚀

#AI #MachineLearning #CognitiveScience #short

0

Comments

Want to join the conversation?

Loading comments...