AI Videos

All News Deals Social Blogs Videos Podcasts Digests

Stanford CS336 Language Modeling From Scratch | Spring 2026 | Guest Lecture: Dan Fu

•June 5, 2026

Stanford Online

Stanford Online•Jun 5, 2026

Why It Matters

Optimizing inference is central to making large language models practical and economical at scale, affecting product latency, cloud costs, and who can deploy advanced AI. Advances in inference software and GPU utilization will shape industry competitive advantage and broaden real‑world AI applications.

Summary

Dan Fu, guest lecturing for Stanford CS336, outlined the engineering and research challenges of serving large language models, focusing on the end-to-end “lifetime of a token” from request to GPU-backed inference. He argued that scale and GPU capacity have driven recent leaps in capability and that inference — the software and kernels that map model operations to hardware — is the critical engine that converts compute into usable intelligence. Fu described how understanding inference stacks enables full‑stack ML innovation and previewed research work from UCSD and Together on optimization techniques and system design. He framed these technical problems as fertile ground for improving latency, cost, and new multimodal capabilities.

Original Description

For more information about Stanford's online Artificial Intelligence programs, visit: https://stanford.io/ai

To learn more about enrolling in this course, visit: https://online.stanford.edu/courses/cs336-language-modeling-scratch

Follow along with the course schedule and syllabus: https://cs336.stanford.edu/

Percy Liang

Professor of Computer Science (and courtesy in Statistics)

Tatsunori Hashimoto

Assistant Professor of Computer Science

View the course playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rMqXOcazWaTUHhq-yembLCV

Comments

Want to join the conversation?

Loading comments...