Stanford CS25: Transformers United V6 I Distinct Modes of Generalization From Parameters and Context

Stanford Online
Stanford OnlineMay 20, 2026

Why It Matters

The gap between fine‑tuning and in‑context generalization limits LLM reliability for reasoning tasks, prompting new architectural and prompting strategies to unlock more human‑like inference.

Key Takeaways

  • Contextual prompting yields near‑perfect reversal accuracy, fine‑tuning does not
  • In‑context learning generalizes better on syllogistic and codebook tasks
  • Models trained from scratch still fail to infer latent relations via fine‑tuning
  • Reversal curse stems from causal next‑token architecture, not data scarcity
  • Bridging the gap requires architectural tweaks or test‑time compute strategies

Summary

The talk by Andrew Lampinen explores how large language models (LLMs) generalize knowledge differently when it is stored in model parameters versus when it is supplied in the prompt context. By replicating the "reversal curse"—where fine‑tuned models struggle to answer inverse relational queries—he shows that simply feeding the same facts as context enables 99% accuracy, highlighting a stark contrast between parameter‑based and context‑based learning.

Across several experiments, Lampinen compares fine‑tuning against in‑context learning on tasks such as relational reversals, syllogistic reasoning, and codebook translation. While fine‑tuned models hover near chance, contextual models consistently achieve high performance, even on novel logical implications. Training a small model from scratch confirms that the limitation is not merely insufficient fine‑tuning data; the models still cannot infer unseen reversals despite abundant exposure.

Key observations include that LLMs implicitly learn to manipulate relational structures present in natural text, yet they do not internalize the latent inference rules during parameter updates. Architectural factors—specifically causal next‑token prediction—exacerbate the reversal curse, whereas bidirectional transformers or modified objectives can mitigate it. The research suggests that test‑time compute or hybrid approaches may bridge the generalization gap.

These findings imply that practitioners should leverage in‑context prompting for tasks requiring flexible relational reasoning, and that future model designs may need to incorporate mechanisms beyond standard fine‑tuning to capture latent structures. Understanding the divergence between parameter and context generalization also offers a window into parallels between artificial and natural intelligence.

Original Description

For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education
May 7, 2026
This seminar covers:
• Two methods for teaching information to language models: training (updating parameters) or in-context learning (providing information in prompts)
• Striking differences in the types of generalization that models make when they learn information via these two routes
• Three different strategies that can help bridge the gap, based on data augmentation, retrieval, and RL
Follow along with the seminar schedule. Visit: https://web.stanford.edu/class/cs25/
Guest Speaker: Andrew Lampinen (Anthropic)
Instructors:
• Steven Feng, Stanford Computer Science PhD student and NSERC PGS-D scholar
• Karan P. Singh, Electrical Engineering PhD student and NSF Graduate Research Fellow in the Stanford Translational AI Lab
• Michael C. Frank, Benjamin Scott Crocker Professor of Human Biology Director, Symbolic Systems Program
• Christopher Manning, Thomas M. Siebel Professor in Machine Learning, Professor of Linguistics and of Computer Science, Co-Founder and Senior Fellow of the Stanford Institute for Human-Centered Artificial Intelligence (HAI)

Comments

Want to join the conversation?

Loading comments...