AI Videos

All News Deals Social Blogs Videos Podcasts Digests

Stanford CS336 Language Modeling From Scratch | Spring 2026 | Lecture 2: PyTorch (Einops)

•April 14, 2026

Stanford Online

Stanford Online•Apr 14, 2026

Why It Matters

Understanding compute and precision trade‑offs lets AI teams train larger models faster and cheaper, directly impacting research productivity and commercial deployment costs.

Key Takeaways

•Compute flops = 6 × parameters × tokens for training estimates
•H100 GPU provides ~0.5 MFU, enabling 70B model in 143 days
•BF16 offers sweet spot between memory use and numerical stability
•Mixed‑precision training balances FP32 optimizer states with BF16 tensors
•einops simplifies tensor dimension handling, reducing indexing errors

Summary

The lecture focused on resource accounting for large language‑model training, covering how to estimate compute, memory needs, and precision choices using PyTorch and the einops library. Professor Wang introduced a simple formula—flops equal six times the number of parameters times the token count—to gauge training cost, then applied it to a 70‑billion‑parameter model on an H100 GPU, arriving at roughly 143 days of compute. He explained tensor memory basics, comparing FP32, FP16, BF16, and newer formats like FP8 and FP4, emphasizing BF16 as the practical sweet spot for most workloads, and described mixed‑precision training that keeps optimizer states in FP32 while using BF16 for activations and gradients. The session also demonstrated einops for named‑dimension tensor operations, showing how it avoids the pitfalls of manual index manipulation. By mastering these calculations and tools, students can design training pipelines that maximize hardware efficiency, control costs, and scale models responsibly.

Original Description

For more information about Stanford's online Artificial Intelligence programs, visit: https://stanford.io/ai

To learn more about enrolling in this course, visit: https://online.stanford.edu/courses/cs336-language-modeling-scratch

Follow along with the course schedule and syllabus: https://cs336.stanford.edu/

Percy Liang

Professor of Computer Science (and courtesy in Statistics)

Tatsunori Hashimoto

Assistant Professor of Computer Science

View the course playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rMqXOcazWaTUHhq-yembLCV

Comments

Want to join the conversation?

Loading comments...