DeepMind’s New AI Found A Strange New Way To Think

Two Minute Papers
Two Minute PapersJun 5, 2026

Why It Matters

This demonstrates that system design—the verification loop and judging infrastructure—can turn fallible large models into reliable tools for hard, decades-old mathematical problems, potentially accelerating automated theorem proving and lowering per-solution costs. It also shifts AI progress from solely model improvements to engineering better harnesses around models.

Summary

DeepMind’s new system, AlphaProof Nexus, attempted about 350 formalized Erdős problems and produced nine verified proofs, a 95.7% failure rate, at a cost of a few hundred dollars per solved problem. The team used Lean for formal verification and a novel tournament-style loop where multiple AI-generated candidate proofs are iteratively judged and refined by a cheaper verifier that ranks partial solutions until one passes the validator. The approach accepts unreliable base models that lie or fail repeatedly but extracts reliable proofs by tightening the surrounding orchestration and validation. The experiment focused on a subset of easier-to-formalize problems and still required large models, underscoring both promise and current limits.

Original Description

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers
📝 The paper is available here:
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi
Thumbnail design: https://felicia.hu

Comments

Want to join the conversation?

Loading comments...