AI Videos
  • All Technology
  • AI
  • Autonomy
  • B2B Growth
  • Big Data
  • BioTech
  • ClimateTech
  • Consumer Tech
  • Cybersecurity
  • DevOps
  • Digital Marketing
  • Ecommerce
  • EdTech
  • Enterprise
  • FinTech
  • GovTech
  • Hardware
  • HealthTech
  • HRTech
  • LegalTech
  • Nanotech
  • PropTech
  • Quantum
  • Robotics
  • SaaS
  • SpaceTech
AllNewsDealsSocialBlogsVideosPodcastsDigests
HomeTechnologyAIVideosAllen School Colloquium: Test-Time Training
AI

Allen School Colloquium: Test-Time Training

•March 9, 2026
UW CSE (Allen School)
UW CSE (Allen School)•Mar 9, 2026

Why It Matters

By enabling models to learn during inference, test‑time training dramatically improves efficiency on long‑context tasks, reshaping how large language models are deployed in real‑world applications.

Key Takeaways

  • •Test-time training lets models adapt effectively during inference.
  • •Sliding-window attention reduces latency but loses long-range information.
  • •Backpropagation at test time compresses distant context into model weights.
  • •Method outperforms full-context transformers beyond thirty-two thousand tokens.
  • •Meta-learning trains models to excel at take-home test scenarios.

Summary

The colloquium introduced test‑time training, a paradigm where models continue to learn while being deployed. Yan, a post‑doctoral researcher at Stanford and Nvidia, traced the idea back to his 2019 PhD work and explained how it mirrors the "take‑home test" approach: instead of guessing at inference, a model updates itself using data available at test time.

Traditional machine‑learning pipelines consist of pre‑training, fine‑tuning, and a static testing phase. This static phase becomes a bottleneck when dealing with long‑context inputs such as legal documents or code bases. Sliding‑window attention keeps latency constant but discards information outside the window, leading to higher loss. Yan’s solution applies a backward pass on each new token, using the same next‑token loss as pre‑training, thereby compressing the out‑of‑window context into the model’s weights.

He illustrated the concept with anecdotes—from Chinese exam culture to Andrew Wiles’s decades‑long proof—highlighting how learning on the job can be more powerful than rote preparation. Empirical results show that beyond roughly 32 k tokens, his test‑time training approach becomes faster than a full‑context transformer, achieving 2.7× speed‑up in pre‑fill and up to six‑fold acceleration during decoding while maintaining comparable loss.

If adopted broadly, test‑time training could shift the frontier of AI from static inference toward continual adaptation, reducing latency for long‑context tasks and opening new avenues for meta‑learning and continual learning research.

Original Description

Title: Beyond Physical Intelligence: Why Generalist Robots Require Social Intelligence
Speaker: Yu Sun (Stanford
Date: Thursday, March 5, 2026
Abstract: Most AI models are trained only before the test instances arrive and then fixed during deployment, even though making good predictions on test instances is the ultimate goal of training. What if we continue to train a model after each test instance arrives? In this talk, we discuss how this conceptual framework, known as test-time training, leads to long-term memory that scales differently with context length, and enables AI to discover new results on open scientific problems.
Bio: Yu Sun is a postdoc at Stanford University and a researcher at NVIDIA. His research focuses on continual learning, specifically a conceptual framework known as test-time training, where each test instance defines its own learning problem. Yu obtained his PhD in EECS from UC Berkeley and BS in CS from Cornell University.
This video is in the process of being closed captioned.

Comments

Want to join the conversation?

Loading comments...

AI Pulse

EMAIL DIGESTS

Daily

Every morning

Weekly

Tuesday recap

Top Publishers

  • The Verge AI

    The Verge AI

    21 followers

  • TechCrunch AI

    TechCrunch AI

    19 followers

  • Crunchbase News AI

    Crunchbase News AI

    15 followers

  • TechRadar

    TechRadar

    15 followers

  • Hacker News

    Hacker News

    13 followers

See More →

Top Creators

  • Ryan Allis

    Ryan Allis

    194 followers

  • Elon Musk

    Elon Musk

    78 followers

  • Sam Altman

    Sam Altman

    68 followers

  • Mark Cuban

    Mark Cuban

    56 followers

  • Jack Dorsey

    Jack Dorsey

    39 followers

See More →

Top Companies

  • SaasRise

    SaasRise

    196 followers

  • Anthropic

    Anthropic

    39 followers

  • OpenAI

    OpenAI

    21 followers

  • Hugging Face

    Hugging Face

    15 followers

  • xAI

    xAI

    12 followers

See More →

Top Investors

  • Andreessen Horowitz

    Andreessen Horowitz

    16 followers

  • Y Combinator

    Y Combinator

    15 followers

  • Sequoia Capital

    Sequoia Capital

    12 followers

  • General Catalyst

    General Catalyst

    8 followers

  • A16Z Crypto

    A16Z Crypto

    5 followers

See More →
NewsDealsSocialBlogsVideosPodcasts