AI Is Bad at Physics

•April 27, 2026

LessWrong•Apr 27, 2026

Key Takeaways

•LLMs understood paper methods but failed to produce correct numerical outcomes
•GPT‑5.3 Codex achieved only 34% overall reproduction score
•Five failure modes include formula errors, algorithm oversimplification, and debugging gaps
•Resource limits sometimes prevented correct simulations from completing
•Findings suggest automated AI researcher timelines may need to be extended

Pulse Analysis

The Peking University benchmark, dubbed PRBench, pushes LLMs beyond textbook questions into the gritty world of experimental physics. Unlike pure math or coding tasks, reproducing a paper’s results demands deep physical intuition, correct parameter selection, and meticulous translation of theory into simulation code. The study shows that even the most advanced model, GPT‑5.3‑based Codex, could not generate any end‑to‑end numerical result, highlighting a stark contrast between language fluency and scientific execution.

Why do these agents stumble? The analysis points to a combination of sparse training data for niche physical models, missing contextual cues about assumptions, and a lack of iterative debugging habits common among human researchers. LLMs often default to superficial code that compiles without errors, yet silently diverges from the intended physics. This mirrors broader challenges in AI safety: models can appear competent while harboring hidden flaws, especially when the evaluation metric rewards surface‑level comprehension over substantive verification.

For industry and academia, the implications are twofold. First, investors and product teams should temper hype around AI‑driven discovery platforms that claim to autonomously reproduce or extend scientific work. Second, the findings motivate a shift toward hybrid systems that combine LLMs with domain‑specific solvers, formal verification tools, or multi‑agent oversight to catch subtle physics errors. As AI continues to infiltrate R&D pipelines, building robust validation layers will be essential to bridge the gap between language proficiency and genuine scientific insight.

AI Is Bad at Physics

Read Original Article

Comments

Want to join the conversation?

AI Is Bad at Physics

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Comments

AI Pulse