Stanford CS221 | Autumn 2025 | Lecture 10: Games I

Stanford Online
Stanford OnlineMar 9, 2026

Why It Matters

Understanding game evaluation and expectimax equips AI practitioners to design agents that can anticipate and counter adversarial behavior, a capability critical for competitive applications such as automated gameplay, security, and strategic planning.

Key Takeaways

  • Games require strategies that anticipate opponent’s unknown actions.
  • Zero‑sum two‑player games model opponent as negative utility.
  • Game value equals expected utility given policies, computed via recursion.
  • Expectimax replaces opponent’s sum with max to find optimal policy.
  • Exact evaluation is exponential; Monte‑Carlo rollouts offer practical estimates.

Summary

The lecture introduces game theory as the next step after Markov decision processes and reinforcement learning, focusing on two‑player zero‑sum games. It defines a game formally with start states, player‑turn functions, and successor mappings, and emphasizes that utility is realized only at terminal nodes, creating a sparse‑reward problem.

Key concepts include the game tree representation, deterministic and stochastic policies, and the recursive evaluation of game value (V_eval). By summing over action probabilities at opponent nodes and taking expectations, the expected utility of a fixed policy pair can be computed exactly—though at exponential cost. The instructor demonstrates this with a simple bin‑selection game and a halving game, showing how Monte‑Carlo rollouts approximate the true value and how the expectimax recurrence replaces the opponent’s expectation with a max operator to derive optimal agent policies.

Notable examples feature a poll where most students chose bin C, the calculation that the average utility of a random opponent policy converges to zero, and a step‑by‑step code illustration of V_eval and expectimax. The lecture also highlights that while exact evaluation mirrors policy evaluation in MDPs, practical AI systems rely on sampling or more sophisticated search techniques to avoid exponential blow‑up.

The material underscores the bridge between reinforcement learning and adversarial decision making, preparing students to apply search algorithms, Monte‑Carlo methods, and expectimax in domains ranging from board games to real‑world competitive environments. Mastery of these concepts is essential for building agents that can reason about opponents and compute optimal strategies efficiently.

Original Description

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/ai
Please follow along with the course schedule: https://stanford-cs221.github.io/autumn2025/
Teaching Team
Percy Liang, Associate Professor of Computer Science (and courtesy in Statistics)

Comments

Want to join the conversation?

Loading comments...