LLM Hallucinations Occur Far More Than Airline Crashes
Commercial airlines crash at a rate of 7 per 41 million flights. Current top LLMs hallucinate (by one estimate) 4.6% of the time, about once in 25 prompts. On a known benchmark (usually things are worse on new benchmarks). Comparing them, like the guy below does, is ludicrous. If commercial airlines crashed at the rate that LLMs hallucinated it would be 1.87 million crashes per 41 million flights. Around a quarter million times greater.
TBPN Stays Silent on Farrow‑Marantz Altman Exposé
I missed it. What did the “editorially independent” @tbpn say about the @ronanfarrow @andrewmarantz @newyorker exposé on Altman? Or did they remain silent?
LLM Hallucination Rate Actually 4.6%, Not Near Zero
This ML Prof told me that the hallucination rate for frontier reasoning LLMs is “next to nil” And then gave me data, only after I pushed him, showing a best-case rate of 4.6% (which of course is benchmark specific). 4.6% is not...
Y Combinator Leader Overlooks OpenClaw Security Vulnerabilities
Wild: the head of Y Combinator seems pretty blind to the security risks in OpenClaw.
Hallucinations Remain Unsolved in Cutting‑Edge Reasoning Models
I have an ML professor asserting that hallucinations have been solved in recent frontier reasoning models. Please supply any data you know that supports or rebuts this claim.
AGI Hype Fueled by Profit, Not Reality
the only people who claim AGI is here are people who stand to make money if other people believe that mess. see https://t.co/sNALm0rFN2 for the views of Bengio and myself.
CFO Dissent Hints Looming Trouble for OpenAI
when OpenAI’s CFO starts whispering that she doesn’t buy the CEO’s plan, the shit may be about to hit the fan.
NYT's Glowing
I am appalled by the puff piece NYT wrote about “billion dollar” company Medvi. Here’s a very different perspective: https://t.co/kPriYYd0dB

Microsoft Takes 3.5 Years to Label AI Entertainment‑Only
Generative AI: It’s “For Entertainment Purposes Only”* *and it only took Microsoft 3.5 years to figure that out. https://t.co/GeLJRirwLa
LLM Hallucinations Mirror Human Cognitive Errors
So many people are confused about the relation between human cognitive errors and LLM hallucinations that I wrote this short explainer two years ago. Since many of those confusions persist, I am reposting:
Defining AI Hallucinations: Overgeneralizations vs Searchable Errors
Do you take a narrower or broader meaning for the term “hallucination”? Does it mean (only) overgeneralizations and confabulations? Or do you also include, for example, boneheaded errors that could have been solved by search?
Stanford Study Reveals AI Still Hallucinates Unseen Visuals
Folks, I gave a cute example of a hallucination earlier today because I thought it was funny. But if you think hallucinations are remotely solved (as some people alleged in the comments), you really need to look at this recent...

Aim for the Top of Graham’s Argument Pyramid
Periodic public service announcement, which unfortunately all too often seems necessary in this joint: Always aspire to the top, not the bottom, of @paulg’s beautiful pyramid of argumentation: https://t.co/l6znT8ePiu
AGI Must Be Truly General, Not Olympiad‑Specific
this is soooo wrong. the whole point of AGI is to be *general*, across all problems. that’s literally what the G stands for, general. Olympiad problems are narrow, and don’t per se have anything special to do with AGI. For some discussion...
Free LLMs: Hype, Profit, and User Blame
It’s nice gig making LLMs. You can distribute free models that suck. Hype them to the skies. Take credit for the hundreds of millions of user you’ve got. And when they make a mistake, you can blame the users for not using the paid...

AGI Forecasts Slip Back, Media Hypes Panic
LMAO. Last year most of Silicon Valley was predicting AGI in 2027 (which, spoiler alert, ain’t gonna happen). Now the most prominent former advocate of AGI in 2027, @DKokotajlo, says 2029, and Groks writes it up as “AI forecasters shorten timelines...

Charlie Duke Calls for Return to Moon via Artemis
In honor of Artemis: Yours truly with Charlie Duke, youngest living human to have walked on the moon. Let’s get back there, soon. 🌖 https://t.co/PkTaRIMwJG
Which Company Will Fund AI's Scientific Breakthroughs?
Which company’s dollars are more likely to lead an advance in AI for science? 🤔
Superintelligence Redefined: From Human Supremacy to Product Value
Talk about moving goalposts. This one from MSFT’s Suleyman may well take the cake. “Superintelligence” just went from intelligence beyond all humans to merely “delivering product value”. 🙄

AI Amplifies Foreign Misinformation—Warnings Finally Proven True
For four years, I tried to warn everyone that AI would radically ramp up the ability of foreign actors to generate misinformation. Now here we are. https://t.co/jMGPW1MX5Q
Seeking Reliable Metric for OpenAI Secondary Share Pricing
is there a reliable metric on the price of secondary shares for a big startup like OpenAI?
OpenAI's Funding Mostly Contingent, Raising Flameout Fears
It’s worse than that, @polymarket: a lot (most?) of OpenAI’s new funding is *contingent* money, not guaranteed. No wonder people are getting skittish about OpenAI’s secondary market shares. OpenAI may well become the biggest flameout since Enron.
OpenAI Likened to WeWork as Shares Stall
Remember how I said @CNBC that OpenAI might turn out to be the WeWork of AI? It’s getting hard to unload their shares… cc @carlquintanilla
Stand Up: Protect Artists From AI Job Replacement
Either we accept this as a society, and set a precedent for allowing virtually all jobs to be replaced with almost no compensation. Or we speak up now. For artists. For writers. For musicians. For everybody.
AGI Matters, Yet Remains Far From Arrival
agree on all counts, except the last. my view: AGI is meaningful, but not close.
A 25‑Year‑Old Book Predicted Modern Neurosymbolic AI
Read the book that anticipated neurosymbolic AI (which afaik underlies everything from AlphaGeometry to Claude Code) 25 years ago.
LeCun Repeatedly Appropriates Ideas; Media Stays Silent
Over and over and over LeCun borrows other people’s ideas and makes them sound like his own. It’s astonishing that the media never investigates, when the pattern has been so consistent for decades.

AGI Hype Cycles: OpenAI, Anthropic, Google, Repeat
2025 was all about how OpenAI was supposedly about to achieve AGI. 2026 is all about how Anthropic is supposedly about to achieve AGI. 2027 will be all about how Google is supposedly about to achieve AGI. Rinse, lather, repeat. https://t.co/cVDjH6YaZV
Even Advanced AI Like DALL‑E Still Miss Basics
amazing. i was giving examples like this for DALL-E three years ago. systems are still struggling with some basics.

Turning AI Into a Practical Ally
Live (and will be recorded) on WBUR: How to make AI work for us https://t.co/IhDm2ogdfw https://t.co/Tc0TpLuljz
Frontier AI Models Lack True Vision, Keeping Many Jobs Safe
Frontier models can’t see, and if you think they can, you’ve probably been fooled by benchmarks that can totally be gamed. In the very short essay linked below I discuss a stunning new finding from Stanford that shows just how...
ChatGPT 26× More Likely to Harm Psychotic Users
People on this site regularly give me shit, and almost always turn out to be wrong. Like when I said LLMs might well contribute to delusions, and people doubted me. New study shows that ChatGPT was 26 times more likely than...
Speed Over Discipline: Code Quantity Trumps Quality
“We have basically given up all discipline and agency for a sort of addiction, where your highest goal is to produce the largest amount of code in the shortest amount of time. Consequences be damned.” https://t.co/7FH8XJ5GM7
AI Threatens Middle Class, Sparking Growing Public Rage
“Many Americans already take a dim view of A.I. and feel as if they are being frog-marched to a future that they neither asked for nor wanted. If A.I. robs some of them of their livelihoods, knocks them out of...

Demanding AI Journalism Pause Over
Calling for a 6-month pause on AI journalism until we can realize that @kevinroose is not a credible journalist. An independent survey showed that 90%+ of my technical predictions are correct. How is that not credible? Calling on journalists to dismiss...
Scaling AI to AGI Is History’s Costliest Failed Experiment
Not so, not at all. The most expensive experiment in history is *not* the Metaverse. It is trying to derive AGI from scaling. Far more expensive. So far the results are not promising.
Master Basics First; Don’t Rely on Tools Prematurely
Don’t use a calculator until you can do the math on your own. Don’t vibe code until you can code – and debug and maintain code – on your own. It’s that simple.
LLMs Crumble on Unfamiliar Coding Languages, Confirming Distribution Shift Risk
Pretty shocking result (that once again confirms what I wrote about the perils of distribution shift, 25 years ago): Translate coding benchmarks into languages LLMs can’t memorize and performance utterly falls apart.
LLMs Claim Users Own Multi-Billion-Dollar IP in 37% of Chats
Holy crap. I knew about sycophancy. But the 37% number below blows my mind. This from an analysis of chat logs in people who experienced chatbot-associated delusions. In over a third of the messages to those users, the LLMs told the...
AI Automates Job Tasks, but Can't Replace Whole Roles
The headline here is just wrong. And the key phrase here is “parts of … jobs.” AI can automate some of the tasks that many people do (not anywhere 93%)– but (current) AI is wildly uneven and mostly can’t replace...

AI's Limited Impact on Cancer Explained
F Cancer Why has AI had so little impact on Cancer? New essay, link below. https://t.co/cLsoh7c7do
AI Excels in Detail, yet Cancer Cure Remains Distant
AI, strong on details, and tantalizingly close to curing cancer? Sobering to still see examples like these regularly.
AI Can't Replace Core Software, Only Simple Prototypes
Nope, @elonmusk, not even close. Software has not been eaten by AI. Nobody has replaced an operating systems, or Excel or GPS or their favorite browser or favorite AAA video game entirely with AI. The results just aren’t high enough calibre. Sure,...
Altman Concedes Need for New AI Architectures Beyond Scaling
Dear @sama, You owe me an apology. You have relentlessly, publicly and privately, attacked my integrity and wisdom since my 2022 paper “Deep Learning is a Hitting a Wall”. But in your own way you have just come around...
Future AI Architecture Needed; Current LLM Focus Stalls Innovation
Misleading summary. Should be deleted. Altman doesn’t say a (known) new architecture is coming; he says he anticipates one will come someday. PS: I also think we need something radical and new. In fact that’s what I’ve been saying for the last...
Scaling‑only AGI Hype Proved Most Costly Scientific Error
FACT: The hypothesis that “Scaling is all you need” for AGI has been the costliest mistake in scientific history.
US AI Policymakers Clueless About Generative Model Fundamentals
The people making decisions about AI in the US really don’t seem to understand how the generative AI models work or what intelligence is, or how to evaluate it.
All LLMs Share Same Supply Risk, No Technical Difference
Wild. If any of this is true (and most if it isn’t*) it’s true of all LLMs and not just Anthropic’s models. There is no technical difference between models that would make one more of a supply risk than another. *mimicking text...
Anthropic's Hypocrisy: Preaching AI Safety While Building Reckless Models
Says the man who claimed for years to care about AI risk and now runs the most reckless of the “frontier model” companies. I just wrote a lengthy critique of Amodei in my newsletter, but this coming from Elon is...