
Stanford CS336 Language Modeling From Scratch | Spring 2026 | Lecture 4: Attention Alternatives
The lecture covered advanced transformer architectures, focusing on attention alternatives that achieve linear‑time complexity and the use of mixture‑of‑experts (MoE) to boost parameter efficiency. Professor Kumar explained why quadratic attention costs dominate as context length grows and introduced techniques—such as exploiting the associativity of matrix multiplication, flash attention, and hybrid local‑global schemes—to curb those costs. Key insights included the re‑ordering of Q·Kᵀ·V into Q·(Kᵀ·V), which shifts the dominant term from N² to N·D, and the observation that flash attention provides dramatic constant‑factor gains without altering asymptotic behavior. Hybrid models like Minimax M1 interleave several linear‑attention layers with a single full softmax layer, achieving competitive performance against large models such as GPT‑3. The professor highlighted state‑space approaches—Mamba 2 and Gated‑DeltaNet—that add input‑dependent gates (γₜ) to the linear recurrence, preserving parallel training while enabling fast recurrent inference. Open‑source frontier models (Neon 3, Gated‑DeltaNet‑based systems) demonstrate that these gated recurrences deliver high throughput at long context windows. Overall, the shift toward linear‑time attention and MoE architectures promises scalable, cost‑effective language models capable of handling tens of millions of tokens, opening new possibilities for complex AI agents and enterprise applications.

Stanford CS336 Language Modeling From Scratch | Spring 2026 | Lecture 3: Architectures
The lecture surveys modern transformer architectures, emphasizing how design choices have crystallized around stability and scalability. Starting from the original Vaswani transformer, the instructor traces the shift from post‑norm residual placement to pre‑norm, noting that moving layer‑norm outside the residual...

Stanford CS336 Language Modeling From Scratch | Spring 2026 | Lecture 2: PyTorch (Einops)
The lecture focused on resource accounting for large language‑model training, covering how to estimate compute, memory needs, and precision choices using PyTorch and the einops library. Professor Wang introduced a simple formula—flops equal six times the number of parameters times the...

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 2 - Score Matching
Lecture two of Stanford CME296 introduces score matching as the next‑generation framework for generative modeling, following the diffusion‑based DDPM approach covered previously. The professor revisits the goal of sampling from an unknown data distribution and contrasts the traditional reverse‑diffusion noise‑prediction...

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 1 - Diffusion
The video introduces Stanford’s CME296 course on diffusion and large vision models, taught by twin brothers with experience at Uber, Google, and Netflix. It outlines the class’s two main goals—understanding image‑generation paradigms and the training/evaluation of underlying models—while stressing the...

Stanford Robotics Seminar ENGR319 | Winter 2026 | Gen Control, Action Chunking, Moravec’s Paradox
The Stanford Robotics Seminar examined why learning from demonstration remains harder for physical robots than for symbolic AI, coining an "algorithmic Moravec's paradox" that highlights fundamental instability in continuous control. The speaker traced the recent surge in narrow manipulation capabilities...

Stanford CS547 HCI Seminar | Winter 2026 | Computational Ecosystems
The talk explores how computational ecosystems can be reshaped to align HCI work with personal values, moving beyond incremental tool improvements toward systemic redesign. The speaker argues that many persistent human problems stem from entrenched processes rather than missing technology,...

Course Overview: Systems Leadership
The video introduces a new leadership framework called systems leadership, designed for today’s fast‑changing, crisis‑laden environment. Robert Seagull explains that this approach requires leaders to internalize previously separate dualities and to see how their organization interacts with broader ecosystems. Key insights...

Course Overview - Web Security
The video introduces Stanford’s advanced cyber‑security program, co‑directed by Neil Dwani with professors Dan Bonet and Zakir Demerich, to train professionals in defending web applications against today’s most damaging threats. It positions the course as essential for anyone who builds,...

Stanford CS221 | Autumn 2025 | Lecture 20: Fireside Chat, Conclusion
The final lecture of Stanford CS221 featured a fireside chat with instructor Percy, structured around career, life, research advice, class logistics, and a forward‑looking AI outlook. The informal format let students probe Percy’s personal journey from early MIT AI courses...

Stanford CS221 | Autumn 2025 | Lecture 19: AI Supply Chains
The Stanford CS221 lecture framed AI as a supply‑chain phenomenon, urging technologists to look beyond model design and consider the upstream resources and downstream applications that shape societal outcomes. Professor Rishi highlighted how AI now accounts for a third of...

Stanford CS221 | Autumn 2025 | Lecture 18: AI & Society
The Stanford CS221 lecture pivots from algorithms to AI’s societal footprint, arguing that the technology’s influence now rivals the printing press and steam engine. The professor stresses that AI’s rapid adoption—evidenced by ChatGPT’s 800 million weekly users—marks the early stage of...

Stanford CS221 | Autumn 2025 | Lecture 17: Language Models
The Stanford CS221 lecture 17 provides a sweeping overview of modern language models, emphasizing their ubiquity—from chat assistants and phone keyboards to code‑completion tools—and the massive scale at which they are built. Professor Kumar walks students through concrete examples such as...

Stanford CS221 | Autumn 2025 | Lecture 15: Logic I
The lecture introduces logic as the final technical pillar before the AI society module, emphasizing propositional logic as a foundational formal language for representing and reasoning about knowledge. Professor Pietschmann contrasts logical reasoning with earlier topics—search, MDPs, Bayesian networks—highlighting its deterministic...

Stanford CS221 | Autumn 2025 | Lecture 14: Bayesian Networks and Learning
The lecture revisits Bayesian networks as a compact representation of joint probability distributions, built from a directed acyclic graph and local conditional probability tables. After a quick refresher using the classic burglary‑earthquake‑alarm example, the professor reviews exact and approximate inference...