AI Videos
  • All Technology
  • AI
  • Autonomy
  • B2B Growth
  • Big Data
  • BioTech
  • ClimateTech
  • Consumer Tech
  • Crypto
  • Cybersecurity
  • DevOps
  • Digital Marketing
  • Ecommerce
  • EdTech
  • Enterprise
  • FinTech
  • GovTech
  • Hardware
  • HealthTech
  • HRTech
  • LegalTech
  • Nanotech
  • PropTech
  • Quantum
  • Robotics
  • SaaS
  • SpaceTech
AllNewsDealsSocialBlogsVideosPodcastsDigests

AI Pulse

EMAIL DIGESTS

Daily

Every morning

Weekly

Sunday recap

NewsDealsSocialBlogsVideosPodcasts
AIVideosTitans: Learning to Memorize at Test Time (Paper Analysis)
AI

Titans: Learning to Memorize at Test Time (Paper Analysis)

•December 14, 2025
0
Yannic Kilcher
Yannic Kilcher•Dec 14, 2025

Why It Matters

Titans offers a practical path to extend transformer context windows without prohibitive compute, unlocking new commercial use‑cases for long‑form AI applications.

Summary

The video reviews Google Research’s “Titans: Learning to Memorize at Test Time,” a NeurIPS paper that proposes a novel architecture enabling language models to retain information beyond their fixed context window. The presenter explains that the model treats the keys and values of past tokens as a dynamic memory, accessed during inference to retrieve distant context, thereby addressing the quadratic scaling limits of standard transformers.

Key technical insights include a critique of prior “linear transformer” approaches that approximate soft‑max attention with kernel tricks, which the paper argues compresses the entire history into a single matrix‑valued state and suffers from performance degradation. Instead, Titans introduces a neural‑network‑based memory module that learns to store and retrieve representations of earlier segments, allowing the model to attend both to the immediate window and to a learned external memory without sacrificing the expressive power of full attention.

The reviewer highlights several illustrative examples: the model can process a multi‑kilobyte document by chunking it, feeding each chunk sequentially while the memory network accumulates salient information, and then using that memory to answer questions that reference far‑back passages. Notable quotes from the paper—such as “memory is a neural network” and the speaker’s observation that “the marketing sometimes overstates novelty”—underscore the tension between genuine innovation and re‑branding of existing techniques.

Implications for the AI industry are significant. If Titans’ memory mechanism scales, it could enable cost‑effective deployment of large language models on hardware with limited memory, broaden applicability to tasks like long‑form document analysis, video understanding, and real‑time decision making, and potentially shift research focus toward train‑time‑efficient, test‑time‑adaptive architectures.

Original Description

Paper: https://arxiv.org/abs/2501.00663
Abstract:
Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a new neural long-term memory module that learns to memorize historical context and helps attention to attend to the current context while utilizing long past information. We show that this neural memory has the advantage of fast parallelizable training while maintaining a fast inference. From a memory perspective, we argue that attention due to its limited context but accurate dependency modeling performs as a short-term memory, while neural memory due to its ability to memorize the data, acts as a long-term, more persistent, memory. Based on these two modules, we introduce a new family of architectures, called Titans, and present three variants to address how one can effectively incorporate memory into this architecture. Our experimental results on language modeling, common-sense reasoning, genomics, and time series tasks show that Titans are more effective than Transformers and recent modern linear recurrent models. They further can effectively scale to larger than 2M context window size with higher accuracy in needle-in-haystack tasks compared to baselines.
Authors: Ali Behrouz, Peilin Zhong, Vahab Mirrokni
Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
0

Comments

Want to join the conversation?

Loading comments...