Stanford CS221 | Autumn 2025 | Lecture 2: Learning I
Why It Matters
Grasping einsum and gradient computation demystifies core operations in deep‑learning libraries, accelerating model development and optimization.
Key Takeaways
- •Einops names tensor axes for clearer computation in models.
- •einsum expresses identity, sum, elementwise, dot, outer products.
- •Output axes subset of input axes determines reduction operations.
- •Gradients of scalar loss guide weight updates via backpropagation.
- •Linear regression loss built from matrix‑vector ops illustrates gradient flow.
Summary
The lecture introduces tensors and the einops library, emphasizing how naming axes clarifies operations across any order. It then dives deep into the einsum function, showing how a single notation can express identity mapping, summations, element‑wise products, dot products, outer products, and common matrix manipulations such as transposes and matrix‑vector multiplications.
Key insights include the concept of tensor order (scalar, vector, matrix, higher‑dimensional tensors) and the rule that output axes must be a subset of input axes, which drives reduction behavior (e.g., summing over a missing axis). The instructor demonstrates practical einsum strings, like "i->i" for identity, "i->" for summing a vector to a scalar, and "ij->ji" for transposition, reinforcing the uniform "plus‑equals" accumulation pattern.
Concrete examples illustrate the mechanics: summing a vector [0,1,10] yields 11; an outer product creates a matrix from two vectors; a matrix‑vector product computes predictions in a linear regression model, leading to a scalar loss of 5. The loss function is expressed as a one‑liner, and evaluating it at different weight vectors shows how the loss changes, setting the stage for gradient‑based optimization.
Understanding einsum and tensor naming equips students to implement backpropagation efficiently, as gradients of scalar loss functions with respect to parameters are computed via computation graphs. Mastery of these fundamentals underpins modern machine‑learning frameworks and enables rapid prototyping of complex models.
Comments
Want to join the conversation?
Loading comments...