Stanford CS221 | Autumn 2025 | Lecture 14: Bayesian Networks and Learning
Why It Matters
Learning Bayesian network parameters from data transforms abstract probabilistic models into actionable tools for AI, enabling accurate inference and scalable decision‑making across diverse industries.
Key Takeaways
- •Bayesian networks define joint distributions via directed acyclic graphs.
- •Parameter learning reduces to counting occurrences and normalizing frequencies.
- •Conditional independence enables parallel inference and simplifies computations.
- •Fully observed data allows straightforward maximum likelihood estimation for CPTs.
- •V-structures require careful handling but follow same count‑normalize principle.
Summary
The lecture revisits Bayesian networks as a compact representation of joint probability distributions, built from a directed acyclic graph and local conditional probability tables. After a quick refresher using the classic burglary‑earthquake‑alarm example, the professor reviews exact and approximate inference methods—marginalization, rejection sampling, and Gibbs sampling—and introduces d‑separation rules that determine conditional independence. Key insights include how independence is read off the graph: a path is blocked when a node is conditioned on (or its descendant in a V‑structure), enabling parallel computation during inference. The instructor then shifts to learning: with fully observed data, maximum‑likelihood estimates of each conditional table are obtained by simply counting occurrences of each variable configuration and normalizing to sum to one. Illustrative examples progress from a single‑node network modeling movie ratings, to a two‑node network adding genre, and finally a three‑node network that incorporates awards. In each case, the learning algorithm iterates over the dataset, updates counts for the relevant parent‑child configurations, and normalizes to produce the conditional probability tables, demonstrating that even complex structures follow the same count‑and‑normalize pattern. The practical implication is that Bayesian networks become data‑driven models once their parameters are learned, allowing scalable probabilistic reasoning in real‑world domains such as recommendation systems, fault diagnosis, and causal analysis. Mastery of conditional independence and efficient parameter estimation is essential for deploying reliable AI systems that can reason under uncertainty.
Comments
Want to join the conversation?
Loading comments...