AI Videos

All News Deals Social Blogs Videos Podcasts Digests

Stanford CS221 | Autumn 2025 | Lecture 2: Learning I

•March 9, 2026

Stanford Online

Stanford Online•Mar 9, 2026

Why It Matters

Grasping einsum and gradient computation demystifies core operations in deep‑learning libraries, accelerating model development and optimization.

Key Takeaways

•Einops names tensor axes for clearer computation in models.
•einsum expresses identity, sum, elementwise, dot, outer products.
•Output axes subset of input axes determines reduction operations.
•Gradients of scalar loss guide weight updates via backpropagation.
•Linear regression loss built from matrix‑vector ops illustrates gradient flow.

Summary

The lecture introduces tensors and the einops library, emphasizing how naming axes clarifies operations across any order. It then dives deep into the einsum function, showing how a single notation can express identity mapping, summations, element‑wise products, dot products, outer products, and common matrix manipulations such as transposes and matrix‑vector multiplications.

Key insights include the concept of tensor order (scalar, vector, matrix, higher‑dimensional tensors) and the rule that output axes must be a subset of input axes, which drives reduction behavior (e.g., summing over a missing axis). The instructor demonstrates practical einsum strings, like "i->i" for identity, "i->" for summing a vector to a scalar, and "ij->ji" for transposition, reinforcing the uniform "plus‑equals" accumulation pattern.

Concrete examples illustrate the mechanics: summing a vector [0,1,10] yields 11; an outer product creates a matrix from two vectors; a matrix‑vector product computes predictions in a linear regression model, leading to a scalar loss of 5. The loss function is expressed as a one‑liner, and evaluating it at different weight vectors shows how the loss changes, setting the stage for gradient‑based optimization.

Understanding einsum and tensor naming equips students to implement backpropagation efficiently, as gradients of scalar loss functions with respect to parameters are computed via computation graphs. Mastery of these fundamentals underpins modern machine‑learning frameworks and enables rapid prototyping of complex models.

Original Description

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/ai

To learn more about enrolling in this course, visit: https://online.stanford.edu/courses/cs221-artificial-intelligence-principles-and-techniques

Please follow along with the course schedule: https://stanford-cs221.github.io/autumn2025/

Follow the playlist: https://youtube.com/playlist?list=PLoROMvodv4rMeDqwS1yFl3j3sR_-MQNEN&si=bVivXjDfVEQKky1D

Teaching Team

Percy Liang, Associate Professor of Computer Science (and courtesy in Statistics)

Comments

Want to join the conversation?

Loading comments...