Ethan Mollick

Creator

0 followers

Wharton professor; startups/innovation/AI and evidence-based tactics.

Social•May 11, 2026

Bigger LLMs Excel: GPT‑5.5 Tops Negotiation Benchmark

One of the most important properties of LLMs that we take for granted is that newer, bigger models are just better at everything. The AI Labs are pouring effort into economically valuable fields like coding, but bigger models are also better at negotiation, alignment, poetry, etc

By Ethan Mollick

Social•May 11, 2026

Open AI Use Needed as Citation Hallucinations Surge 12‑fold

This seems like a critical reason to open up about AI use in academia. Scholars are using old AI models, badly, and not talking about it. New models hallucinate very few citations, and good agentic harnesses drop that further. Being...

By Ethan Mollick

Social•May 11, 2026

AI Prompts Mask Low-Quality Writing, Flooding Platforms

This is going to get even worse as people realize that careful tuning in their prompts can make AI writing seem not like AI writing to readers. We expect word counts to align, in some way, with thinking & value. Writing...

By Ethan Mollick

Social•May 11, 2026

Enterprises Demand Roadmap; Labs Push Exponential AI Scaling

Enterprises are going to actually want a coherent roadmap for the development of tools like Codex and Cowork, so they can plan and train and scale their use. This conflicts with the Labs’ vision where these tools rapidly scale exponentially in...

By Ethan Mollick

Social•May 10, 2026

AI Is No Longer Exclusive to San Francisco

I think we are past the point where “only people in San Francisco get AI” is true. AI users are in every industry & they have access to the same models. SF is far from the epicenter of many of...

By Ethan Mollick

Social•May 10, 2026

Humanizing Claude May Shape AI’s Future Impact

The personification of Claude — in name (the only AI with a human one), in training, in Anthropic’s philosophy (see Claude Constitution), in fanfiction (see the Claude cartoons), etc — feels quite consequential in the medium term, for better and...

By Ethan Mollick

Social•May 10, 2026

Apple’s New Siri Aims to Match AI Assistants

Apple may be planning to role out its updated Siri based on 2024's vision at the moment when Claude Code and Codex (also OpenClaw) can increasingly do the actual assistant thing: read my emails & calendar, proactively spot & solve...

By Ethan Mollick

Social•May 10, 2026

AI Models Still Stuck in 2022 Mindset

I suspect there was a moment, probably 2022-2023, where anything you wrote publicly about AI that was popular is likely to still have influence over current models. Since then, the open internet has become less key to training but...

By Ethan Mollick

Social•May 9, 2026

Robotics Lacks Reliable Benchmarks Compared to AI Progress

As much as the state of benchmarks in AI is flawed, it is so much easier to track AI progress than robotics. Not sure what you can make of all the videos of robots running races or doing laundry -...

By Ethan Mollick

Social•May 8, 2026

Mythos Hype: No AI Miracle, but It Found Exploits

I realize that “Mythos as hype” means two different things to different groups. For insiders, it means “Mythos was not a magical step-change in AI ability.” For outsiders, it means “Mythos couldn’t really find zero day exploits” The latter was wrong,...

By Ethan Mollick

Social•May 8, 2026

Explore Snap, Crackle, and Pop: Interactive Kinematics Simulator

I have always found it charming that the fourth, fifth and sixth derivatives of position are snap, crackle, and pop. Because I could, I asked Codex to throw together a little simulation so you can play with them (as well...

By Ethan Mollick

Social•May 8, 2026

Guilded Professions Will Secure AI Safeguards; Freelancers Won’t

Professions with guilds or membership associations are going to get different AI policy reactions than those without The Bar & the AMA will ensure that human doctors or lawyers are legally required for key activities. There is no equivalent organization for...

By Ethan Mollick

Social•May 7, 2026

General AI Models Naturally Discover Exploits, Not Hype

So Mythos was, indeed, not marketing hype. Remember this is a general purpose model that just happens to be good at finding exploits because good models are good at lots of things. Expect similar from OpenAI & Google. And from...

By Ethan Mollick

Social•May 7, 2026

OpenAI Lacks PowerPoint Integration Despite Image Generation Edge

OpenAI for Excel is quite useful (as is Claude for Excel), so it is surprising, that, unlike Claude, there is no OpenAI for PowerPoint, especially because it is where OpenAI has a big advantage: Imagegen-2 can make very good slides/images...

By Ethan Mollick

Social•May 7, 2026

AI Alone Can't Drive Enterprise Transformation without Expert Guidance

The inability of AI systems to act as their own deployment consultants, process mappers, and change management experts is what makes AI use in enterprises so “normal” - the tools are powerful, but you need a lot more to...

By Ethan Mollick

Social•May 7, 2026

AI's Hype Masks Few Real Long-Term Strategies

Don’t let the exponential gains in ability fool you: there are fewer real grand plans in AI (or frankly any field of human endeavor) than you think. Companies are pivoting as the market changes, sometimes models are really good or...

By Ethan Mollick

Social•May 7, 2026

Seeing Meaning in AI: Text Pareidolia Reveals Confusion

Pareidolia, but for text. Apophenia, but for latent spaces. Its no wonder that our relationship to LLMs is so confusing.

By Ethan Mollick

Social•May 7, 2026

Unfunded $24B Could Have Delivered 100‑Day Prototype Vaccines

Every so often I think about how, in 2022, for $24B we could had "prototype vaccines ready for each of the 26 known viral families that cause human disease" so they can be deployed in 100 days if there was...

By Ethan Mollick

Social•May 6, 2026

Deal Shakes Confidence in Grok's Frontier Model Status

I usually avoid commenting too much on industry deals, but this one is fascinating. Certainly seems like a blow to the idea that Grok will remain a frontier model.

By Ethan Mollick

Social•May 6, 2026

Human‑Centric Decisions Needed; Claude Alone Offers No Edge

A critical question in agent design is “how do we build agentic workflows so humans are given significant, interesting, or variance-producing decisions as they come up in the work?” A Claude-run company has no source of competitive advantage compared to other...

By Ethan Mollick

Social•May 5, 2026

Free GPT‑5.5 Instant Matches Paid 2025‑level Performance

All benchmarks are flawed, but GPQA has been fairly consistent & highly correlated with other measured benchmars. I think it's a good way to see how far we've come that the free model from OpenAI, GPT 5.5 Instant, is at...

By Ethan Mollick

Social•May 5, 2026

NIST Should Conduct Public, Independent AI Capability Tests

In addition to the CAISI evaluation, it would be useful if NIST conducted public tests of AI abilities as an independent evaluator - though those obviously should not be pre-release tests & can be done when models are public. Independent testing...

By Ethan Mollick

Social•May 5, 2026

LLMs' Unreasonable Effectiveness Makes Them Universally Powerful

The unreasonable effectiveness of LLMs is what makes them so weird. The labs don’t need to decide what kind of AI to build, because better LLMs do better at most things. Finance? Pig disease identification? Restaurant suggestions? Coding? Yup. Most tech doesn’t...

By Ethan Mollick

Social•May 5, 2026

Claiming AI Expertise No Longer Improves Performance

A reminder that telling the AI that it is an expert in a field is no longer helpful in making the AI better at that field.

By Ethan Mollick

Social•May 5, 2026

Organizational AI Remains Jagged, Lacking Clear Terminology

For individual AI use, the jagged frontier is increasingly well understood. In multi-agent workflows in organizations, AI is jagged in ways that have not been well identified yet. In fact, we don't even have a vocabulary around multi-agent systems & the...

By Ethan Mollick

Social•May 4, 2026

Coastal Contrast: GPT‑5.5 Party Vs. Claude Finance Briefing

May 5 is the GPT-5.5 launch celebration in San Francisco and the Claude Finance Briefing in New York. Real opposite valence events on opposite coasts.

By Ethan Mollick

Social•May 4, 2026

AI Regulation Stumbles Without Reliable Risk Benchmarks

A challenge with AI regulation and vetting is how bad our benchmarks of AI model performance and risks are. There is no benchmark for risks and red-teaming requires experiments from dedicated specialist organizations & is not easy to put metrics...

By Ethan Mollick

Social•May 4, 2026

Retracted Study Undermines AI Education Claims, RCTs Prevail

My surprise here seems warranted, this paper was retracted (There are other peer-reviewed meta-analyses of the impact of AI on education finding positive effects, like: https://t.co/bLHHelTLCs though the best evidence of AI helping is from RCTs of interventions with AI tutors)...

By Ethan Mollick

Social•May 3, 2026

AI Rewrites Cover Letters Creatively, but Newer Model Urges Restraint

Sometimes when I demo AI, I show it turning cover letters into goofy formats (poetry, etc) as an introduction to the idea of AI as translator between forms. For the first time, GPT-5.5 has been trying to get me to tone...

By Ethan Mollick

Social•May 3, 2026

AI Index Good for Rough Model Comparison, Not Trend Analysis

The artificial analysis index is a normalized score of several benchmarks (and has changed over time) it is fine for roughly comparing models, it is not useful for trend analysis and it is unclear what individual point differences in the...

By Ethan Mollick

Social•May 3, 2026

Anthropic’s Unique Bond with Claude Reshapes AI Development

I am not sure I would agree with all of this, but the relationship between Anthropic and Claude is quite different than the relationship between other labs and their models. And that shows up in lots of ways, from the...

By Ethan Mollick

Social•May 3, 2026

Open Models Lag Closed Ones in Robustness and Emergent Abilities

This is a good explanation of why the gap between open and closed models is larger than it appears in benchmarks. I would add in that current open models are also more fragile than closed: they handle out-of-distribution problems far...

By Ethan Mollick

Social•May 3, 2026

Benchmarks Undercount Agent Progress on Long Tasks

Its getting hard to benchmark frontier agent performance on longer tasks. Repeated measurement is very expensive and there are differences between using models in harnesses versus via APIs. I suspect benchmarks understate progress, they are built for models, not harnessed agents

By Ethan Mollick

Social•May 2, 2026

Execs Base AI Lab Decisions on X Rumors, Not Reality

Generally, I would say X is not real life, but I am surprised about how often I get asked by executives about which AI lab is winning or what is up with a particular model in ways that indicate that...

By Ethan Mollick

Social•May 2, 2026

Agents Drive Rapid Shift From AI Bubble to Data‑Center Shortage

I was quoted a couple times in this Atlantic article, but that isn’t (the only) reason I think it is good. It lays out the reasons why we whipsawed from “AI is a bubble” to “there are not enough data...

By Ethan Mollick

Social•May 1, 2026

Organic AI Quirks Beat Forced Viral Moments

The goblin thing was fun as it was a real quirk that was emblematic of what makes AI interesting, and it organically came out of an AI user discovery. So was, for what it was worth, Ghiblitization When the labs try...

By Ethan Mollick

Social•May 1, 2026

AI's Real Power Lies in Firm-Wide Integration

Organizations are already superhuman intelligences. The University of Pennsylvania or Walmart or whatever is far more capable than any human. That is why the focus on AIs as individual productivity tools hits a natural limit, many benefits of AI depend on...

By Ethan Mollick

Social•May 1, 2026

AI Generates Increasingly Whimsical 5x5 Image Grids

GPT-imagegen-2: "make 5x5 grid of dog photos, where each photo gets noticeably cuter" ...now cats ...now man-eating squid ...now covers of the book the Great Gatsby https://t.co/FdtCcWTzKC

By Ethan Mollick

Social•May 1, 2026

AI Chatbot Boosts Mexican Women's Mental Health and Employment

Randomized trial of an AI therapy chatbot on Mexican women found “improved mental health by 0.3 SD over 6 months with no evidence of an increase of severe cases; improved sleep, healthful behaviors, daily functioning & labor market outcomes” Big results...

By Ethan Mollick

Social•Apr 30, 2026

Frequent AI Users Spot Synthetic Writing Instantly

"Load bearing," "I keep coming back to," "Not X, but Y" A curse of using AI a lot is that you realize how much of the writing around you is just AI, now People who don't use AI have been unable...

By Ethan Mollick

Social•Apr 30, 2026

Open‑source AI Models Pose New Regulatory Challenges

For better or worse, regulation for closed-source models served by a few (quite large) companies is easy. It is not as easy to imagine how you regulate open-source models that can be served by a range of decentralized players. Suspect...

By Ethan Mollick

Social•Apr 30, 2026

Native Apps Unlock More Power Than Model APIs

Increasingly, I think, we will see a gap between what you can do with frontier model APIs & what you can do with the native apps from the frontier labs (Codex, Claude Code). Models developed and trained with their native...

By Ethan Mollick

Social•Apr 30, 2026

Same AI Models, Divergent Paths: Startup vs Microsoft

It is really interesting that Microsoft and OpenAI have access to the exact same models at the exact same time, and they have done such different things with them. A rare pure experiment with a no-name startup and one of the...

By Ethan Mollick

Social•Apr 30, 2026

Gemini Chatbot Shows Promise but Lacks Integration and Perseverance

I think the Gemini chatbot has all the pieces to be a useful tool, but struggles to put it all together. It still doesn't seem to know what files it can create or how its tools work together. It also seems...

By Ethan Mollick

Social•Apr 30, 2026

Anthropic’s Cybersecurity Caution Limits Model, Rivals Unchecked

Mythos seems to be a very capable model based on available information, but it is not a cybersecurity model - it is an advanced general purpose model that happens to be good at cyber because it is good at a...

By Ethan Mollick

Social•Apr 30, 2026

AI Handles Ideas, Humans Still Run the Party

Illustration of the jagged frontier as a PR thing: 1) People had to ask the AI for a party date 2) People wrote the social media posts about the party, set up the invite list 3) People had to solicit AI for the...

By Ethan Mollick

Social•Apr 29, 2026

Gemini's Document Features Lag Behind True AI Capabilities

Gemini now can create documents, and it is a nice start, but not up to the frontier yet, as you can see from my "LBO of Hogwarts" test. PowerPoints are substantially worse than NotebookLM, spreadsheets are primitive, still no thinking trace,...

By Ethan Mollick

Social•Apr 29, 2026

AI's Growing Judgment Challenges Human-Only Role

One reason I don’t think “judgment” is going to be a distinctly human role in working with AI is that the most recent agentic models have gotten quite good at some types of judgment. You can’t do the kind of...

By Ethan Mollick

Social•Apr 29, 2026

AI Tutors with Teacher Guidance Boost Learning, Self-Use Harms

Yes, just having students “use AI to study” hurts learning (a helpful assistant is not a tutor), but using AI prompted to act like a tutor, especially with teacher support, seems to have large positive effects on learning in randomized...

By Ethan Mollick

Social•Apr 28, 2026

AI Work Forecasts Stuck in Pre‑Agentic Data

A big problem with all AI at work punditry right now is that it all rests on data from the pre-agentic era (which is basically just now ending) and we have very little information about what has been happening since...

By Ethan Mollick

Ethan Mollick

Bigger LLMs Excel: GPT‑5.5 Tops Negotiation Benchmark

Open AI Use Needed as Citation Hallucinations Surge 12‑fold

AI Prompts Mask Low-Quality Writing, Flooding Platforms

Enterprises Demand Roadmap; Labs Push Exponential AI Scaling

AI Is No Longer Exclusive to San Francisco

Humanizing Claude May Shape AI’s Future Impact

Apple’s New Siri Aims to Match AI Assistants

AI Models Still Stuck in 2022 Mindset

Robotics Lacks Reliable Benchmarks Compared to AI Progress

Mythos Hype: No AI Miracle, but It Found Exploits

Explore Snap, Crackle, and Pop: Interactive Kinematics Simulator

Guilded Professions Will Secure AI Safeguards; Freelancers Won’t

General AI Models Naturally Discover Exploits, Not Hype

OpenAI Lacks PowerPoint Integration Despite Image Generation Edge

AI Alone Can't Drive Enterprise Transformation without Expert Guidance

AI's Hype Masks Few Real Long-Term Strategies

Seeing Meaning in AI: Text Pareidolia Reveals Confusion

Unfunded $24B Could Have Delivered 100‑Day Prototype Vaccines

Deal Shakes Confidence in Grok's Frontier Model Status

Human‑Centric Decisions Needed; Claude Alone Offers No Edge

Free GPT‑5.5 Instant Matches Paid 2025‑level Performance

NIST Should Conduct Public, Independent AI Capability Tests

LLMs' Unreasonable Effectiveness Makes Them Universally Powerful

Claiming AI Expertise No Longer Improves Performance

Organizational AI Remains Jagged, Lacking Clear Terminology

Coastal Contrast: GPT‑5.5 Party Vs. Claude Finance Briefing

AI Regulation Stumbles Without Reliable Risk Benchmarks

Retracted Study Undermines AI Education Claims, RCTs Prevail

AI Rewrites Cover Letters Creatively, but Newer Model Urges Restraint

AI Index Good for Rough Model Comparison, Not Trend Analysis

Anthropic’s Unique Bond with Claude Reshapes AI Development

Open Models Lag Closed Ones in Robustness and Emergent Abilities

Benchmarks Undercount Agent Progress on Long Tasks

Execs Base AI Lab Decisions on X Rumors, Not Reality

Agents Drive Rapid Shift From AI Bubble to Data‑Center Shortage

Organic AI Quirks Beat Forced Viral Moments

AI's Real Power Lies in Firm-Wide Integration

AI Generates Increasingly Whimsical 5x5 Image Grids

AI Chatbot Boosts Mexican Women's Mental Health and Employment

Frequent AI Users Spot Synthetic Writing Instantly

Open‑source AI Models Pose New Regulatory Challenges

Native Apps Unlock More Power Than Model APIs

Same AI Models, Divergent Paths: Startup vs Microsoft

Gemini Chatbot Shows Promise but Lacks Integration and Perseverance

Anthropic’s Cybersecurity Caution Limits Model, Rivals Unchecked

AI Handles Ideas, Humans Still Run the Party

Gemini's Document Features Lag Behind True AI Capabilities

AI's Growing Judgment Challenges Human-Only Role

AI Tutors with Teacher Guidance Boost Learning, Self-Use Harms

AI Work Forecasts Stuck in Pre‑Agentic Data

Technology Pulse