Solving the Wrong Problem Works Better - Robert Lange

Machine Learning Street Talk
Machine Learning Street TalkMar 13, 2026

Why It Matters

By making evolutionary LLM systems more sample‑efficient and open‑ended, Shinka Evolve lowers barriers to AI‑driven discovery, positioning AI as a scalable partner for human creativity in science and engineering.

Key Takeaways

  • Evolutionary LLMs improve sample efficiency via program archives.
  • Co‑evolving problems and solutions yields richer, open‑ended discoveries.
  • Starting from impoverished solutions boosts diversity and novelty.
  • Shinka Evolve outperforms prior methods on tasks like circle packing.
  • Human creativity remains essential; AI acts as powerful amplifier.

Summary

Robert Lange frames the conversation around evolutionary algorithms applied to large language models, highlighting his Shinka Evolve system as a concrete step toward open‑ended scientific discovery. He argues that current autonomous LLM pipelines often stall because they focus on a single, fixed problem, whereas true innovation may require inventing new problems and iteratively refining both tasks and solutions.

The core insight is sample efficiency: by maintaining an archive of programs, sampling parent solutions across “islands,” and using LLMs to edit or recombine code, Shinka Evolve reduces the number of evaluations needed to surpass benchmarks such as the classic circle‑packing task. Starting from impoverished or sub‑optimal seeds encourages broader exploration, while more constrained seeds converge quickly but limit novelty.

Lange cites concrete examples—Alpha Evolve’s recursive matrix‑multiplication reduction, the leaked Nemo Claw agent platform, and the dramatic performance gains on circle packing—to illustrate how stepping‑stone accumulation and co‑evolution of problems and solutions can unlock breakthroughs that static prompts cannot achieve. He also references Kenneth Stanley’s “open‑endedness” philosophy and recent work like POET, emphasizing the need for systems that can generate their own curricula.

The broader implication is a democratized research pipeline: open‑source, sample‑efficient evolutionary LLM tools could enable non‑experts to tackle complex scientific questions, while humans remain the source of deep understanding and creative direction. This shift suggests a future where AI amplifies human ingenuity rather than replacing it, reshaping how discovery is conducted across academia and industry.

Original Description

Robert Lange, founding researcher at Sakana AI, joins Tim to discuss Shinka Evolve — a framework that combines LLMs with evolutionary algorithms to do open-ended program search. The core claim: systems like AlphaEvolve can optimize solutions to fixed problems, but real scientific progress requires co-evolving the problems themselves.
GTC is coming, the premier AI conference, great opportunity to learn about AI. NVIDIA and partners will showcase breakthroughs in physical AI, AI factories, agentic AI, and inference, exploring the next wave of AI innovation for developers and researchers. Register for virtual GTC for free, using my link and win NVIDIA DGX Spark (https://nvda.ws/4qQ0LMg)
In this episode:
• Why AlphaEvolve gets stuck — it needs a human to hand it the right problem. Shinka tries to invent new problems automatically, drawing on ideas from POET, PowerPlay, and MAP-Elites quality-diversity search.
• The architecture of Shinka: an archive of programs organized as islands, LLMs used as mutation operators, and a UCB bandit that adaptively selects between frontier models (GPT-5, Sonnet 4.5, Gemini) mid-run. The credit-assignment problem across models turns out to be genuinely hard.
• Concrete results — state-of-the-art circle packing with dramatically fewer evaluations, second place in an AtCoder competitive programming challenge, evolved load-balancing loss functions for mixture-of-experts models, and agent scaffolds for AIME math benchmarks.
• Are these systems actually thinking outside the box, or are they parasitic on their starting conditions? When LLMs run autonomously, "nothing interesting happens." Robert pushes back with the stepping-stone argument — evolution doesn't need to extrapolate, just recombine usefully.
• The AI Scientist question: can automated research pipelines produce real science, or just workshop-level slop that passes surface-level review? Robert is honest that the current version is more co-pilot than autonomous researcher.
• Where this lands in 5-20 years — Robert's prediction that scientific research will be fundamentally transformed, and Tim's thought experiment about alien mathematical artifacts that no human could have conceived.

TIMESTAMPS:
00:00:00 Introduction: Robert Lange, Sakana AI and Shinka Evolve
00:04:15 AlphaEvolve's Blind Spot: Co-Evolving Problems with Solutions
00:09:05 Unknown Unknowns, POET, and Auto-Curricula for AI Science
00:14:20 MAP-Elites and Quality-Diversity: Shinka's Evolutionary Architecture
00:28:00 UCB Bandits, Mutations and the Vibe Research Vision
00:40:00 Scaling Shinka: Meta-Evolution, Democratisation and the Three-Axis Model
00:47:10 Applications, ARC-AGI and the Future of Work
00:57:00 The AI Scientist and the Human Co-Pilot: Who Steers the Search?
01:06:00 AI Scientist v2, Slop Critique and the Future of Scientific Publishing

REFERENCES:
paper:
[00:03:30] ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution
[00:04:15] AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery
[00:06:30] Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
[00:09:05] Paired Open-Ended Trailblazer (POET)
[00:10:00] PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem
[00:10:40] Automated Capability Discovery via Foundation Model Self-Exploration
[00:15:30] Illuminating Search Spaces by Mapping Elites (MAP-Elites)
[00:47:10] Automated Design of Agentic Systems (ADAS)
[00:49:50] Discovering Preference Optimization Algorithms with and for Large Language Models (DiscoPOP)
[00:57:00] The AI Scientist v2: Automating the Full Research Pipeline
book:
[00:06:48] Why Greatness Cannot Be Planned
benchmark:
[00:47:10] ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering
[00:50:50] On the Measure of Intelligence (ARC-AGI)

LINKS:

Comments

Want to join the conversation?

Loading comments...