AI X-risk Research Podcast (AXRP)

49 - Caspar Oesterheld on Program Equilibrium

AI X-risk Research Podcast (AXRP)

•February 18, 2026•2h 32m

AI X-risk Research Podcast (AXRP)•Feb 18, 2026

Why It Matters

Understanding program equilibria is crucial as AI systems become more capable of inspecting and predicting each other's behavior, influencing the safety and coordination of autonomous agents. The episode highlights both theoretical advances and practical implications for building robust, cooperative AI, making it timely for researchers and policymakers concerned with AI alignment and multi‑agent interactions.

Key Takeaways

•Program equilibrium treats programs as strategies with source‑code access.
•Equality‑check programs yield cooperative Nash equilibria in Prisoner's Dilemma.
•Robustness lacks formal definition; syntactic variations break cooperation.
•Coordinating identical code is impractical without pre‑communication.

Pulse Analysis

Program equilibrium reframes classic game‑theoretic dilemmas by allowing each player to submit a computer program that can inspect the opponent's source code at runtime. This shift creates a "program game" where strategies are encoded, enabling agents to condition their actions on the exact code they face. For AI safety researchers, the model offers a concrete testbed for studying multi‑agent coordination, transparency, and the emergence of cooperation when agents can reason about each other's algorithms rather than just observable moves.

A striking result from early papers is the simple equality‑check program: if the opponent's code matches yours, cooperate; otherwise defect. When both players submit this program, it forms a Nash equilibrium that yields the Pareto‑optimal payoff in the Prisoner's Dilemma, turning a universally defection‑dominated game into a cooperative outcome. However, the equilibrium’s stability hinges on exact syntactic matching. Minor differences—extra spaces, variable names, or compilation flags—break the cooperation, exposing a gap in the literature: the notion of "robust program equilibrium" remains informal, lacking a formal solution concept that tolerates such superficial variations.

Practically, achieving the required code alignment is challenging without prior coordination, limiting real‑world applicability. Researchers must devise mechanisms for robust code agreement, perhaps through cryptographic hashes, standardized libraries, or meta‑programs that abstract away syntactic details. The discussion also touches on broader themes in cooperative AI and decision‑theory fairness, where agents should be judged by behavior rather than hidden internal models. Advancing robust, behavior‑based program equilibria could unlock safer multi‑agent systems that reliably cooperate even when exact source code cannot be shared, a key step toward scalable AI alignment.

Episode Description

How does game theory work when everyone is a computer program who can read everyone else's source code? This is the problem of 'program equilibria'. In this episode, I talk with Caspar Oesterheld on work he's done on equilibria of programs that simulate each other, and how robust these equilibria are.

Patreon: https://www.patreon.com/axrpodcast

Ko-fi: https://ko-fi.com/axrpodcast

Transcript: https://axrp.net/episode/2026/02/18/episode-49-caspar-oesterheld-program-equilibrium.html

Note from Caspar on 2:00:06: At least given my current interpretation of what you say here, my answer is wrong. What actually happens is that we're just back in the uncorrelated case. Basically my simulations will be a simulated repeated game in which everything is correlated because I feed you my random sequence and your simulations will be a repeated game where everything is correlated. Halting works the same as usual. But of course what we end up actually playing will be uncorrelated. We discuss something like this later in the episode.

Topics we discuss, and timestamps:

0:00:44 Program equilibrium basics

0:14:20 Desiderata for program equilibria

0:24:35 Why program equilibrium matters

0:33:35 Prior work: reachable equilibria and proof-based approaches

0:53:26 The basic idea of Robust Program Equilibrium

1:07:47 Are ϵGroundedπBots inefficient?

1:15:06 Compatibility of proof-based and simulation-based program equilibria

1:18:32 Cooperating against CooperateBot, and how to avoid it

1:44:43 Making better simulation-based bots

2:01:22 Characterizing simulation-based program equilibria

2:21:24 Follow-up work

2:29:49 Following Caspar's research

Links for Caspar:

Academic website: https://www.andrew.cmu.edu/user/coesterh/

Google Scholar: https://scholar.google.com/citations?user=xeEcRjkAAAAJ&hl=en

Blog: https://casparoesterheld.com/

X / Twitter: https://x.com/c_oesterheld

Research we discuss:

Robust program equilibrium: https://link.springer.com/article/10.1007/s11238-018-9679-3

Characterising Simulation-Based Program Equilibria: https://arxiv.org/abs/2412.14570

Manifold open-source prisoner's dilemma tournament: https://manifold.markets/IsaacKing/which-240-character-program-wins-th

Results of Alex Mennen's open source prisoner's dilemma tournament: https://www.lesswrong.com/posts/QP7Ne4KXKytj4Krkx/prisoner-s-dilemma-tournament-results-0

A General Counterexample to Any Decision Theory and Some Responses: https://arxiv.org/abs/2101.00280

Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory: https://arxiv.org/abs/2208.07006

Parametric Bounded Löb's Theorem and Robust Cooperation of Bounded Agents: https://arxiv.org/abs/1602.04184

A Note on the Compatibility of Different Robust Program Equilibria of the Prisoner's Dilemma: https://arxiv.org/abs/2211.05057

Episode art by Hamish Doodles: hamishdoodles.com

Show Notes

Comments

Want to join the conversation?

Loading comments...