DeepMind’s New AI Found A Strange New Way To Think
Why It Matters
This demonstrates that system design—the verification loop and judging infrastructure—can turn fallible large models into reliable tools for hard, decades-old mathematical problems, potentially accelerating automated theorem proving and lowering per-solution costs. It also shifts AI progress from solely model improvements to engineering better harnesses around models.
Summary
DeepMind’s new system, AlphaProof Nexus, attempted about 350 formalized Erdős problems and produced nine verified proofs, a 95.7% failure rate, at a cost of a few hundred dollars per solved problem. The team used Lean for formal verification and a novel tournament-style loop where multiple AI-generated candidate proofs are iteratively judged and refined by a cheaper verifier that ranks partial solutions until one passes the validator. The approach accepts unreliable base models that lie or fail repeatedly but extracts reliable proofs by tightening the surrounding orchestration and validation. The experiment focused on a subset of easier-to-formalize problems and still required large models, underscoring both promise and current limits.
Comments
Want to join the conversation?
Loading comments...