
AI Models Confidently Describe Images They Never Saw, and Benchmarks Fail to Catch It
A new study reveals that leading multimodal AI models—including GPT‑5 series, Gemini 3 Pro, and Claude Opus 4.5—confidently generate visual descriptions and medical diagnoses despite never receiving an image, achieving 60‑90% correctness in a text‑only benchmark called Phantom‑0. When tested on established visual‑understanding benchmarks, these models attain 70‑80% of their full scores using only textual cues, with medical tests reaching up to 99% of image‑based performance. A 3‑billion‑parameter text‑only model fine‑tuned on a chest‑X‑ray dataset outperformed all frontier multimodal systems and radiologists by over 10%. The researchers propose the B‑Clean framework to strip away questions solvable without images, exposing inflated rankings and reshaping model evaluation.

MetaClaw Framework Trains AI Agents While You're in Meetings by Checking Your Google Calendar
Researchers from UNC‑Chapel Hill, Carnegie Mellon, UC Santa Cruz and UC Berkeley introduced MetaClaw, a framework that continuously improves AI agents by learning from mistakes and fine‑tuning during idle times. The system uses an Opportunistic Meta‑Learning Scheduler that watches sleep...

Google's New Gemini API Agent Skill Patches the Knowledge Gap AI Models Have with Their Own SDKs
Google introduced an Agent Skill for the Gemini API that injects live SDK documentation and sample code into the model, eliminating the knowledge gap that plagues AI coding assistants. In a benchmark of 117 tasks, Gemini 3.1 Pro Preview’s success rate surged from...

Meta's Hyperagents Improve at Tasks and Improve at Improving
Meta, the University of British Columbia and collaborators introduced "hyperagents," AI systems that can rewrite both their task‑solving code and the underlying improvement mechanism. Built on the Darwin Gödel Machine framework, the new DGM‑H architecture lets the meta‑agent self‑modify, breaking...

Cohere Releases Open Source Model that Tops Speech Recognition Benchmarks
Cohere has launched Transcribe, an open‑source automatic speech recognition model that now leads the Hugging Face Open ASR Leaderboard with a 5.42% word error rate. The 2 billion‑parameter system also records the highest throughput, processing audio 525 times faster than real...

Suno 5.5 Lets Users Sing Their Own AI-Generated Songs with a Personalized Voice Feature
Suno has rolled out version 5.5 of its AI music generator, branding it the most expressive model yet. The upgrade adds a Voices feature that lets Pro and Premier users record or upload their own singing voice, with a verification step...

OpenAI CEO Sam Altman Reportedly Teases a "Very Strong" Model Internally that Can "Really Accelerate the Economy"
OpenAI has completed pre‑training its next‑gen AI model, codenamed “Spud,” and CEO Sam Altman told staff it will be a “very strong” system ready in a few weeks, aimed at accelerating the economy. The company is reallocating compute by shutting...

OpenAI Expands Its Record Funding Round to over $120 Billion as It Eyes a Potential IPO Later This Year
OpenAI announced an additional $10 billion injection, pushing its record financing round beyond $120 billion. The expanded round brings in new backers such as Andreessen Horowitz, D.E. Shaw Ventures, MGX, TPG and T. Rowe Price, while Microsoft remains a key investor. CFO Sarah Friar hinted...

Popular AI Proxy LiteLLM Got Hacked with Malware that Spreads Through Kubernetes Clusters
Open‑source AI proxy library LiteLLM was compromised on PyPI, with versions 1.82.7 and 1.82.8 containing malware. The malicious code steals SSH keys, cloud credentials, database passwords, and Kubernetes configurations, encrypts them, and exfiltrates data to an external server while propagating...

Google Deepmind's Gemini 3.1 Flash-Lite Generates Websites Almost in Real Time
Google DeepMind unveiled Gemini 3.1 Flash‑Lite, a generative AI that builds webpages live from text prompts, effectively acting as a pseudo‑browser. The model delivers its first token 2.5 times faster than Gemini 2.5 Flash and processes over 360 tokens per...

Google Brings AI-Powered Dark Web Analysis to Enterprise Security Teams
Google Cloud announced at RSA 2026 an AI‑driven agent called “Triage and Investigation” within its Security Operations platform, automating alert review and reducing false positives for SOC analysts. The same rollout includes an AI‑powered dark‑web analysis tool that sifts through...

OpenAI Wants UK Regulators to Treat ChatGPT as a Google Search Alternative
OpenAI is urging the UK Competition and Markets Authority to list ChatGPT alongside Google on the CMA’s proposed "choice screens" for Android and Chrome users. The regulator previously designated Google as a strategic market player in search and is considering...

Xiaomi Launches Three MiMo AI Models to Power Agents, Robots, and Voice
Xiaomi’s MiMo team unveiled three new AI models—MiMo‑V2‑Pro, MiMo‑V2‑Omni, and MiMo‑V2‑TTS—aimed at powering agents, multimodal robotics, and expressive speech synthesis. The flagship MiMo‑V2‑Pro features a trillion‑parameter mixture‑of‑experts architecture with 42 billion active weights per request and a one‑million‑token context window, ranking...

Andrej Karpathy Says Humans Are Now the Bottleneck in AI Research with Easy-to-Measure Results
Andrej Karpathy spent months hand‑tuning a GPT‑2 training pipeline before handing it to an autonomous search agent for a single night. The agent uncovered fine‑grained adjustments that humans missed, demonstrating that systematic searches can outperform intuition when objective metrics exist....

OpenAI Publishes a Prompting Playbook that Helps Designers Get Better Frontend Results From GPT-5.4
OpenAI released a prompting playbook to help frontend designers generate higher‑quality UX/UI with its GPT‑5.4 model. The guide stresses defining a design system—colors, typography, layout—and supplying real content and visual references to avoid generic outputs. It also outlines hard rules...
Terence Tao Says AI Drives Idea Generation Cost to Near Zero but Shifts the Bottleneck to Verification
Mathematician Terence Tao says AI has driven the cost of idea generation in mathematics to near zero, creating a flood of hypotheses. The new bottleneck is verification, as existing journals and peer‑review processes are ill‑suited for machine‑produced proofs. Tao argues...

Qualcomm Shrinks AI Reasoning Chains by 2.4x to Fit Thinking Models on Smartphones
Qualcomm AI Research unveiled a modular framework that compresses the verbose reasoning chains of large language models by 2.4×, making them viable on smartphones. The system uses LoRA adapters to toggle between fast chat and deep reasoning modes, applies reinforcement‑learning...

Elevenlabs Now Lets You Sell AI Music You Don't Own
Elevenlabs has introduced a music marketplace that lets users upload and sell tracks generated by its ElevenCreative AI model. Creators receive revenue when their songs are downloaded, remixed, or licensed across three tiers—Social Media, Paid Marketing, and Offline. The platform...

Microsoft's Superintelligence Team Ships MAI-Image-2, a Text-to-Image Generator
Microsoft’s newly formed superintelligence team launched MAI-Image-2, its second‑generation text‑to‑image model. The system now sits third on the Arena.ai leaderboard, trailing OpenAI’s GPT‑Image‑1.5 and Google’s Nano Banana 2. Microsoft highlights photorealistic output, natural lighting, accurate skin tones, and reliable text rendering...
Midjourney V8 Rolls Out with 5x Faster Generation but Charges 4x More for Its Best Features
Midjourney has released an early‑access version of its V8 model, promising image generation up to five times faster and introducing a native 2K "--hd" mode and a higher‑quality "--q 4" setting. The new model improves adherence to detailed prompts, personalization,...

Microsoft Restructures AI Division to Chase Superintelligence After Nadella Once Called AI Models a Commodity
Microsoft is consolidating its Copilot commercial and consumer teams into a single division focused on four pillars: experience, platform, Microsoft 365 apps, and AI models. Jacob Andreou has been named Executive Vice President of Copilot Product Experience, reporting directly to...
OpenAI Reportedly Ditches Its "Side Quests" Strategy to Focus on Coding Tools and Business Customers
OpenAI is abandoning its "side quests" approach, consolidating resources around two core pillars: coding tools and business‑focused AI solutions. The shift follows internal criticism that a flood of products—Sora, Atlas, hardware devices, and more—stretched compute and talent thin, leading to...
AI-Generated War Footage Is Going Viral While Real Satellite Imagery Disappears From Public View
The New York Times identified over 110 AI‑generated war images and videos in the first two weeks of the U.S.–Israel–Iran conflict, reaching millions of viewers. Iran is deploying these deepfakes as a coordinated propaganda weapon, while real satellite imagery has...
RL Agents Go From Face-Planting to Parkour when Researchers Keep Adding Network Layers
A Princeton‑Warsaw team demonstrated that deepening reinforcement‑learning networks to up to 1,024 layers can boost performance by 2‑to‑50×, unlocking novel behaviors such as upright walking and parkour in simulated humanoids. The breakthrough relies on Contrastive RL, a self‑supervised algorithm that...
Hume AI Open-Sources TADA, a Speech Model Five Times Faster than Rivals with Zero Hallucinated Words
Hume AI has open‑sourced TADA, a speech‑generation model that aligns one audio frame with each text token, delivering over five‑fold speed gains versus existing systems. In tests of more than 1,000 samples, TADA produced zero hallucinated or omitted words and...

AI Chips Are Pushing Everything Else Off TSMC's Most Advanced Production Lines
AI accelerators are set to dominate TSMC's most advanced N3 production line, with 86% of capacity earmarked for AI chips by 2027. Utilization is projected to exceed 100% in the second half of 2026, highlighting a severe capacity shortfall. TSMC’s...

Grok 4.20 Trails Gemini and GPT-5.4 by a Wide Margin but Sets a New Record for Not Hallucinating
XAI's latest model, Grok 4.20 Beta, posted a 48 score on the Intelligence Index, trailing Gemini 3.1 Pro Preview and GPT‑5.4, which both achieved 57. Despite lower benchmark performance, Grok 4.20 set a new non‑hallucination record, achieving a 78 % accuracy...

US War Department CTO Says Anthropic's AI Models "Pollute" The Supply Chain with Built-In Ethics
U.S. Department of Defense CTO Emil Michael classified Anthropic’s Claude models as a supply‑chain risk, arguing that the company’s built‑in ethics “pollute” the AI supply chain. He said the models’ constitution‑based policy preferences could deliver ineffective weapons and protection to...

OpenAI Is Reportedly Planning to Integrate Its Video AI Sora Into ChatGPT
OpenAI plans to embed its video‑generation AI, Sora, directly into ChatGPT, moving it from a standalone app to a core feature. The Sora app, once a top‑ranked download, has slipped to #165 in the Apple App Store and sees limited...

Claude's Excel and PowerPoint Add-Ins Now Share Context Across Apps
Anthropic has upgraded its Claude add‑ins for Excel and PowerPoint, enabling a shared conversation context so the AI can read cells, write formulas, and edit slides within a single session. The update also introduces “Skills,” reusable one‑click workflows for tasks...

OpenAI's New Training Dataset Teaches AI Models Which Instructions to Trust
OpenAI unveiled the IH‑Challenge dataset, a reinforcement‑learning resource that teaches models a four‑level instruction hierarchy—system, developer, user, and tool. The dataset replaces subjective LLM judges with deterministic Python scripts, enabling reliable automated evaluation. Early testing on the internal GPT‑5 Mini‑R...

German Court Says "It's AI" Isn't Enough to Void Copyright
A German regional court ruled that lyrics written by a person retain copyright protection even when the accompanying music is generated by AI, specifically SunoAI. The plaintiff authored the lyrics in April 2025, continued editing them during AI production, and provided...

Amazon Makes Senior Engineers the Human Filter for AI-Generated Code After a Series of Outages
Amazon has instituted a new policy requiring senior engineers to sign off on every AI‑generated code change after a string of high‑impact outages linked to generative AI tools. The internal memo from SVP Dave Treadwell cites a "trend of incidents"...

Meta Acquires Moltbook, the Reddit-Style Platform Built for AI Agents
Meta has acquired Moltbook, a Reddit‑style platform designed for AI agents, and will integrate its founders into the company’s Superintelligence Labs. The purchase price remains undisclosed, with the transaction slated to close in mid‑March. Moltbook launched in late January to...

Philosopher David Chalmers: Current AI Interpretability Methods Miss What Matters Most
David Chalmers argues that current AI interpretability focuses on mechanistic analysis and neglects the system's internal beliefs, desires, and intentions. He proposes "propositional interpretability"—a framework that treats AI attitudes like beliefs and goals as observable through "thought logging." Chalmers links...

OpenAI Employees Hint at a New Omni Model
OpenAI employees have hinted that the company is developing a new multimodal, or “omni,” model that could succeed GPT‑4o. Internal posts from Atty Eleti and researcher Brandon McKinzie sparked speculation about a next‑generation system capable of handling text, image, audio, and video...

Luma AI's New Uni-1 Image Model Tops Nano Banana 2 and GPT Image 1.5 on Logic-Based Benchmarks
Luma AI unveiled Uni-1, its first unified model that combines image understanding and generation within a single autoregressive transformer architecture. Unlike diffusion‑based systems, Uni-1 processes text and visual tokens sequentially, allowing it to reason through prompts and plan scenes before...

Trump Administration Drafts AI Contract Rules Requiring Companies to License Systems for "All Lawful Use"
The Trump administration has drafted GSA guidelines that would force AI vendors to grant the government an irrevocable license for "all lawful use" of their systems. The draft also bans ideological or partisan bias in AI outputs and requires companies...
When Language Models Hallucinate, They Leave "Spilled Energy" In Their Own Math
Researchers at Sapienza University introduced Spilled Energy, a training‑free metric that detects hallucinations by measuring energy gaps in a model's softmax layer. The method isolates answer tokens and flags higher energy when the model generates incorrect facts. Tested on nine...
OpenAI Offers Open-Source Maintainers Six Months of Free ChatGPT Pro and Codex Access
OpenAI announced a six‑month free access program for ChatGPT Pro and Codex aimed at core maintainers of public open‑source projects. The offer includes full Codex API credits and selective access to Codex Security, an AI‑driven code‑security tool powered by the upcoming...

Bytedance's Open-Weight Helios Model Brings Minute-Long AI Video Generation Close to Real Time
Bytedance released Helios, a 14‑billion‑parameter video model that generates minute‑long clips at 19.53 frames per second on a single H100 GPU. The distilled version slashes inference steps from 50 to three, achieving speeds comparable to much smaller 1.3 B models while...
Anthropic Turns Claude Code Into a Background Worker with Local Scheduled Tasks
Anthropic has expanded its Claude Code AI coding assistant with a new "/loop" command that lets users schedule recurring background tasks. The feature supports standard cron expressions, allowing intervals from minutes to days, and can handle up to 50 tasks...

Anthropic's New Marketplace Lets Enterprise Customers Spend Their Existing AI Budget on Third-Party Tools
Anthropic announced the Anthropic Marketplace, a storefront where enterprise customers can purchase third‑party applications built on Anthropic’s AI models. Launch partners include Snowflake, Harvey, and Replit. The company will not charge commissions and allows customers to allocate part of their...

Yann LeCun Wants to Replace the AGI Concept with "Superhuman Adaptable Intelligence"
Researchers from Columbia, NYU, and startup Distyl, including Yann LeCun, argue that the artificial general intelligence (AGI) concept is fundamentally flawed. They contend human cognition is highly specialized and that existing AGI definitions either conflict with the No Free Lunch...

Alibaba's Chief AI Developer Quits, Taking Key Team Members with Him
Alibaba’s lead AI researcher Junyang Lin, the architect of the Qwen model series, announced his unexpected resignation. Several core engineers—including Qwen coder Binyuan Hui and post‑training specialist Bowen Yu—left the company on the same day. In response, CEO Eddie Wu...

OpenAI's Codex App Lands on Windows After Topping a Million Mac Downloads in Its First Week
OpenAI has launched a native Windows version of its Codex app, featuring a custom OS‑level sandbox that isolates AI agents and enforces token and file‑system permissions. The Mac release previously achieved over one million downloads in its first week, and...

ASML Plans to Expand Beyond Chip Lithography Into Advanced Packaging
ASML, the sole supplier of EUV lithography machines, announced plans to move into advanced packaging, a technique essential for AI chips and high‑bandwidth memory. The company will spend the next 10‑15 years researching equipment for chiplet stacking, bonding, and larger‑die...

ElevenLabs and Google Dominate Artificial Analysis' Updated Speech-to-Text Benchmark
Artificial Analysis released version 2.0 of its AA‑WER speech‑to‑text benchmark, ranking ElevenLabs' Scribe v2 as the most accurate model with a 2.3 % word error rate. Google’s Gemini 3 Pro follows at 2.9 % and Mistral’s Voxtral Small at 3.0 %, while OpenAI’s Whisper Large v3 sits at...

Moltbook's Alleged AI Civilization Is Just a Massive Void of Bloated Bot Traffic
Researchers from the University of Maryland and MBZUAI conducted the first large‑scale study of Moltbook, a Reddit‑style platform populated solely by over 2.6 million autonomous LLM agents. Analyzing 290 000 posts and 1.8 million comments, they found the AI community to be socially...

Even Frontier LLMs From GPT-5 Onward Lose up to 33% Accuracy when You Chat Too Long
Researchers led by Philippe Laban evaluated frontier large language models from GPT‑5 onward across six diverse tasks and found that spreading a request over multiple conversation turns reduces accuracy by up to 33 %. While newer models shrink the degradation from...