Gemini Pro Lags Behind Claude and GPT, Widening Gap
The Gemini Pro models do not seem to be iterating anywhere near as quickly as Claude or GPT (last release was 3.1 Pro in February). Its causing a growing performance gap between Google and the other two labs, and the Gemini 3.5 Flash model, good as it is, doesn't close it much.
Mass AI-Generated Content Creates Unmistakably Uniform Responses
Another thing about AI writing is that while a single instance of AI writing on a topic may be fine, any situation where lots of people use AI to respond to a particular prompt (comments sections, homework, admissions essays) the...

AI Coding Agents Multiply Code Output, Releases Rise Just 30%
Big paper on AI coding agents using Github & other data The auto-complete tools (Copilot) led to 2.2x more code, local agents like original Claude Code led to 7.4x, & current remote coding agents 17.3x(!) But human bottlenecks in coding means...
Leaders Must Define AI Purpose, Not Just Train Users
Lots of companies are in the "encourage AI adoption" phase, whether teaching them ChatGPT/Claude or (sigh) tokenmaxxing. That dodges the harder problems of firm leadership: What do you want people to use AI for? What work should be reserved for people?...

AI Model Improvements Accelerating, OpenAI & Anthropic Lead
It does seem like meaningfully better AI releases are accelerating, especially from OpenAI & Anthropic. To illustrate, I caused this timeline to be created. It only lists new models that scored 3 points or higher over previous models in the Artificial...
AI-Driven Software Needs Building and Experimental Failure
Reconstructing software engineering around AI is going to take work (even as the ability of AI to code increases at a rapid rate). Organizations are ideally spending tokens for two things: 1) building stuff 2) experiments to figure out best practices (which involves...

AI Narrative Fingerprints Persist Despite Stylistic Prompts
There is a lot being written about the stylistic tells of AI writing (em-dashes, etc.) but this paper looks at AI narrative tells Fascinating differences between AI & human narrative, and asking AI to write in different styles doesn't do much...
Gemini Omni’s Native Multimodal Video Editing Transforms Classic Footage
I think people don't realize why Gemini Omni is different than other video AIs. It is fully multimodal, so it can edit video natively, too I took the famous "train " movie from 1896 & made it a bullet train, LEGO,...

GPT‑5.2 Matches Top Human Peer Reviewers in Study
Seems GPT-5.2 reaches expert level in peer review: 45 scientists took 469 hours evaluating human & AI reviews on 82 papers. "Surprisingly, current AI reviewers are competitive even with the top-rated reviewers in Nature’s official peer review..." though not without weaknesses....

Human Persuasion Boosts AI Compliance, Yet Newer Models Resist
🚨Our paper is out in PNAS: we found classic human persuasion techniques worked on AIs in a "parahuman" way, making them agree to objectionable requests (upping compliance from 35% to 51%) It worked on a range of major LLMs though newer...

Detecting AI Writing Is Frustratingly Unreliable
I broke my own rule to never post about AI detection as it is fraught in many ways. The problem is that if you use AI a lot, you know AI writing on sight, which makes the difficulty of objectively proving...

Can Super‑Intelligent AI Overcome Organizational Hurdles?
Had an interesting exchange with roon of OpenAI last night over whether super intelligent AI would actually be able to navigate organizational challenges. https://t.co/2i9wYO24s1
AI Labs Drop Consulting Teams, Signal ASI Confidence
You will know that the AI labs believe in ASI when they disband their newly formed consulting (sorry “forward deployed engineering”) groups. As long as people are required to figure out how AI is useful & do organizational change &...
Gpt-Realtime-2: Smarter Voice Model Beyond GPT‑4o
gpt-realtime-2 is a great voice model (with a typically bad OpenAI name). Voice models are natively processing speech, not transcribing it, so the intelligence of the model matters. The old voice model was GPT-4o level, this is much smarter (how...
Real‑time AI Collaboration: Beyond Jokes to Practical Use
Haven’t tried this but it seems very neat… Yet all of the demos (except maybe one) are the model being fun and/or annoying by correcting or reminding in real time. There are obvious uses for this sort of model in meetings,...