AI Videos

All News Deals Social Blogs Videos Podcasts Digests

DeepSeek Just Fixed One Of The Biggest Problems With AI

•March 24, 2026

Two Minute Papers

Two Minute Papers•Mar 24, 2026

Why It Matters

Engram’s efficient lookup dramatically lowers inference costs while boosting accuracy, paving the way for affordable, on‑device AI that can scale beyond today’s cloud‑bound models.

Key Takeaways

•DeepSeek's Engram adds fast lookup memory to transformers.
•Engram reduces reliance on mixture-of-experts, cutting compute waste.
•Hybrid model outperforms baselines on all evaluated benchmarks.
•Context-aware gating prevents irrelevant retrieved facts from corrupting output.
•Early placement of Engram yields higher accuracy; deep insertion harms performance.

Summary

The video dissects DeepSeek AI’s recent paper introducing Engram, a memory‑augmented module that gives transformer‑based models a cheap, fast lookup pantry for factual information. By embedding n‑gram representations and using multi‑head hashing, Engram sidesteps the costly, from‑scratch reasoning that current systems like ChatGPT perform for simple queries.

Key findings show that replacing a portion of the mixture‑of‑experts (MoE) architecture with Engram not only slashes compute but also improves model quality. Loss curves dip dramatically, and the hybrid system beats prior state‑of‑the‑art methods on every benchmark tested, from trivia to reading comprehension.

The presenter highlights vivid analogies—a Michelin‑star chef forced to grow peanuts for a sandwich—to illustrate the inefficiency solved by Engram. Experiments reveal that disabling the Engram memory drops trivia accuracy by 70 % while leaving comprehension largely intact, confirming that the module acts as a factual pantry. A context‑aware gating mechanism further ensures only relevant retrieved facts are used, preventing “rotten fish” from contaminating answers.

If widely adopted, Engram could enable cheaper, faster AI that runs locally without expensive cloud subscriptions, democratizing access to powerful language models. Proper placement of the module early in the network is crucial; deeper insertion erodes its benefits, underscoring the importance of architectural integration.

Original Description

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers

📝 The #DeepSeek paper is available here:

https://github.com/deepseek-ai/Engram

https://arxiv.org/abs/2601.07372

Larry Wheels:

https://www.youtube.com/watch?v=7SM816P5G9s&lc=Ugz7yiDrr_8YD7w8gaN4AaABAg

Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi

My research: https://cg.tuwien.ac.at/~zsolnai/

Comments

Want to join the conversation?

Loading comments...