THE DECODER

Publication

0 followers

News, business insights, and research updates on artificial intelligence

News•Mar 30, 2026

AI Models Confidently Describe Images They Never Saw, and Benchmarks Fail to Catch It

A new study reveals that leading multimodal AI models—including GPT‑5 series, Gemini 3 Pro, and Claude Opus 4.5—confidently generate visual descriptions and medical diagnoses despite never receiving an image, achieving 60‑90% correctness in a text‑only benchmark called Phantom‑0. When tested on established visual‑understanding benchmarks, these models attain 70‑80% of their full scores using only textual cues, with medical tests reaching up to 99% of image‑based performance. A 3‑billion‑parameter text‑only model fine‑tuned on a chest‑X‑ray dataset outperformed all frontier multimodal systems and radiologists by over 10%. The researchers propose the B‑Clean framework to strip away questions solvable without images, exposing inflated rankings and reshaping model evaluation.

THE DECODER

AI Models Confidently Describe Images They Never Saw, and Benchmarks Fail to Catch It

MetaClaw Framework Trains AI Agents While You're in Meetings by Checking Your Google Calendar

Google's New Gemini API Agent Skill Patches the Knowledge Gap AI Models Have with Their Own SDKs

Meta's Hyperagents Improve at Tasks and Improve at Improving

Cohere Releases Open Source Model that Tops Speech Recognition Benchmarks

Suno 5.5 Lets Users Sing Their Own AI-Generated Songs with a Personalized Voice Feature

OpenAI CEO Sam Altman Reportedly Teases a "Very Strong" Model Internally that Can "Really Accelerate the Economy"

OpenAI Expands Its Record Funding Round to over $120 Billion as It Eyes a Potential IPO Later This Year

Popular AI Proxy LiteLLM Got Hacked with Malware that Spreads Through Kubernetes Clusters

Google Deepmind's Gemini 3.1 Flash-Lite Generates Websites Almost in Real Time

Google Brings AI-Powered Dark Web Analysis to Enterprise Security Teams

OpenAI Wants UK Regulators to Treat ChatGPT as a Google Search Alternative

Xiaomi Launches Three MiMo AI Models to Power Agents, Robots, and Voice

Andrej Karpathy Says Humans Are Now the Bottleneck in AI Research with Easy-to-Measure Results

OpenAI Publishes a Prompting Playbook that Helps Designers Get Better Frontend Results From GPT-5.4

Terence Tao Says AI Drives Idea Generation Cost to Near Zero but Shifts the Bottleneck to Verification

Qualcomm Shrinks AI Reasoning Chains by 2.4x to Fit Thinking Models on Smartphones

Elevenlabs Now Lets You Sell AI Music You Don't Own

Microsoft's Superintelligence Team Ships MAI-Image-2, a Text-to-Image Generator

Midjourney V8 Rolls Out with 5x Faster Generation but Charges 4x More for Its Best Features

Microsoft Restructures AI Division to Chase Superintelligence After Nadella Once Called AI Models a Commodity

OpenAI Reportedly Ditches Its "Side Quests" Strategy to Focus on Coding Tools and Business Customers

AI-Generated War Footage Is Going Viral While Real Satellite Imagery Disappears From Public View

RL Agents Go From Face-Planting to Parkour when Researchers Keep Adding Network Layers

Hume AI Open-Sources TADA, a Speech Model Five Times Faster than Rivals with Zero Hallucinated Words

AI Chips Are Pushing Everything Else Off TSMC's Most Advanced Production Lines

Grok 4.20 Trails Gemini and GPT-5.4 by a Wide Margin but Sets a New Record for Not Hallucinating

US War Department CTO Says Anthropic's AI Models "Pollute" The Supply Chain with Built-In Ethics

OpenAI Is Reportedly Planning to Integrate Its Video AI Sora Into ChatGPT

Claude's Excel and PowerPoint Add-Ins Now Share Context Across Apps

OpenAI's New Training Dataset Teaches AI Models Which Instructions to Trust

German Court Says "It's AI" Isn't Enough to Void Copyright

Amazon Makes Senior Engineers the Human Filter for AI-Generated Code After a Series of Outages

Meta Acquires Moltbook, the Reddit-Style Platform Built for AI Agents

Philosopher David Chalmers: Current AI Interpretability Methods Miss What Matters Most

OpenAI Employees Hint at a New Omni Model

Luma AI's New Uni-1 Image Model Tops Nano Banana 2 and GPT Image 1.5 on Logic-Based Benchmarks

Trump Administration Drafts AI Contract Rules Requiring Companies to License Systems for "All Lawful Use"

When Language Models Hallucinate, They Leave "Spilled Energy" In Their Own Math

OpenAI Offers Open-Source Maintainers Six Months of Free ChatGPT Pro and Codex Access

Bytedance's Open-Weight Helios Model Brings Minute-Long AI Video Generation Close to Real Time

Anthropic Turns Claude Code Into a Background Worker with Local Scheduled Tasks

Anthropic's New Marketplace Lets Enterprise Customers Spend Their Existing AI Budget on Third-Party Tools

Yann LeCun Wants to Replace the AGI Concept with "Superhuman Adaptable Intelligence"

Alibaba's Chief AI Developer Quits, Taking Key Team Members with Him

OpenAI's Codex App Lands on Windows After Topping a Million Mac Downloads in Its First Week

ASML Plans to Expand Beyond Chip Lithography Into Advanced Packaging

ElevenLabs and Google Dominate Artificial Analysis' Updated Speech-to-Text Benchmark

Moltbook's Alleged AI Civilization Is Just a Massive Void of Bloated Bot Traffic

Even Frontier LLMs From GPT-5 Onward Lose up to 33% Accuracy when You Chat Too Long

Technology Pulse