Stanford CS221 | Autumn 2025 | Lecture 11: Games II

Stanford Online
Stanford OnlineMar 9, 2026

Why It Matters

Learning evaluation functions via TD reinforcement learning lets AI agents scale to complex, deterministic games without exhaustive search, transforming hand‑crafted heuristics into adaptable, data‑driven strategies.

Key Takeaways

  • Minimax solves zero‑sum games via alternating max/min nodes.
  • Alpha‑beta pruning cuts branches when bounds no longer overlap.
  • TD learning estimates state values, analogous to SARSA for Q‑values.
  • Function approximation enables RL in games with exponential state spaces.
  • Deterministic game transitions let V‑values derive optimal policies.

Summary

The lecture revisits two‑player zero‑sum games, reviewing the minimax principle and alpha‑beta pruning before introducing reinforcement‑learning techniques to learn game evaluation functions. Professor Ng explains why hand‑crafted heuristics, such as chess piece‑value tables, can be replaced by learned value networks.

Key insights include the parallel between SARSA (which estimates Q‑values) and TD learning (which estimates V‑values), the simplifications that deterministic game transitions afford, and the necessity of function approximation when state spaces grow exponentially. The instructor walks through the TD update rule, the construction of a loss function, and the gradient step that nudges the value network toward the bootstrapped target.

Illustrative examples feature a concrete SARSA update, a deterministic successor function, and a simple “tram problem” implementation that shows how V‑values can be turned into Q‑values for action selection. Code snippets demonstrate epsilon‑greedy exploration, target computation, and the distinction between treating the target as a constant versus back‑propagating through it.

The broader implication is that even when the game’s rules are fully known, reinforcement learning remains valuable because exact value iteration is infeasible for large games. Learning a compact value network enables scalable policy extraction, bridging classic game‑tree search with modern deep RL approaches.

Original Description

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/ai
Please follow along with the course schedule: https://stanford-cs221.github.io/autumn2025/
Teaching Team
Percy Liang, Associate Professor of Computer Science (and courtesy in Statistics)

Comments

Want to join the conversation?

Loading comments...