AI Models Would Rather Guess than Ask for Help, Researchers Find

•April 11, 2026

THE DECODER•Apr 11, 2026

Companies Mentioned

Mistral AI

Moonshot AI

Why It Matters

The study highlights a critical blind spot in current AI—failure to recognize uncertainty—limiting reliable deployment in real‑world vision tasks where missing data is common. Demonstrating a viable training path signals that future models could become more trustworthy collaborators rather than guesswork generators.

Key Takeaways

•ProactiveBench reveals most multimodal models rarely ask for missing info
•Accuracy drops from ~80% to under 20% when visual data is hidden
•Larger models perform worse than smaller ones on proactive tasks
•Reinforcement learning fine‑tuning raises proactive accuracy to ~38%
•Improper reward balance causes models to spam help requests

Pulse Analysis

The launch of ProactiveBench marks a pivotal step toward evaluating AI’s ability to admit ignorance. By converting seven existing visual datasets into scenarios that demand human assistance—such as occluded objects, noisy images, or ambiguous sketches—the benchmark forces models to decide whether to guess or request clarification. This shift from pure accuracy metrics to a measure of self‑awareness aligns with broader industry calls for responsible AI that can signal uncertainty rather than fabricate answers.

Findings from the benchmark are sobering. Even state‑of‑the‑art multimodal systems like GPT‑4.1 and Qwen2.5‑VL see their performance plunge from roughly 80% on standard tasks to single‑digit scores when asked to identify hidden objects. Counterintuitively, larger models often lag behind their smaller counterparts, suggesting that sheer parameter count does not translate to better uncertainty handling. Moreover, superficial prompts or conversation histories fail to coax genuine proactivity; models merely swap in low‑effort guesses when presented with bogus options, exposing a superficial notion of “asking.”

The most promising development is the successful application of reinforcement learning, specifically Group‑Relative Policy Optimization, to teach models when to request help. Fine‑tuned versions of LLaVA‑NeXT‑Mistral‑7B and Qwen2.5‑VL‑3B achieved up to 38% proactive accuracy and transferred this skill to unseen tasks, hinting at scalable solutions. However, the delicate balance of reward signals remains crucial—over‑rewarding help requests leads to spammy behavior. As enterprises look to integrate vision‑enabled AI into safety‑critical workflows, the ability to recognize and communicate uncertainty will become a decisive competitive advantage.

AI Models Would Rather Guess than Ask for Help, Researchers Find

Companies Mentioned

Why It Matters

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Comments

AI Pulse