DeepSeek V4 AI Beats Billion Dollar Systems…For Free

Two Minute Papers
Two Minute PapersMay 6, 2026

Why It Matters

DeepSeek V4 democratizes billion‑dollar‑scale language models, offering comparable performance at negligible cost, which could reshape AI adoption across industry and research.

Key Takeaways

  • DeepSeek V4 offers 1‑million token context for free.
  • New KV‑cache compression reduces memory usage by roughly 90%.
  • Pro model matches or exceeds Google Gemini 3.1 on benchmarks.
  • Flash variant runs with ten times less compute than prior models.
  • Open‑weight model remains unimodal, lacking image or audio input.

Summary

DeepSeek V4, the latest open‑weight large language model from the Chinese startup DeepSeek, was unveiled with a 58‑page research paper and immediate public access. The model boasts a 1‑million‑token context window—far larger than most commercial offerings—and is released for free, either self‑hosted or via a low‑cost API.

The paper’s core contribution is a three‑layer KV‑cache compression pipeline: token‑level summarisation (128‑to‑1), heavily compressed attention, and compressed sparse attention. Together they shrink the cache memory by roughly 90 % and cut the compute required for the Pro version to one‑third of its predecessor, while the smaller Flash variant needs ten‑times less compute. Benchmarks show the Pro model matching or surpassing Google’s Gemini 3.1 Pro on fact‑recall and coding tasks.

Reviewer Dr. Károly Zsolnai‑Fehér highlighted the model’s ability to ingest 1,500 pages of dense documentation and retrieve eight hidden facts more reliably than Gemini. He also demonstrated JavaScript generation that runs directly in the model’s UI, and noted the Engram technique that lets the system recall facts without recomputing them each pass.

By delivering near‑state‑of‑the‑art performance at zero licensing cost, DeepSeek V4 could dramatically lower entry barriers for startups, academia, and enterprises that need long‑context reasoning. However, its unimodal nature, unexplained training stabilisation tricks, and degradation near the context limit temper expectations. The release signals a shift toward affordable, open AI infrastructure that may pressure incumbent providers on price and accessibility.

Original Description

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers
📝 Check out DeepSeek here:
Sources:
Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi
Thumbnail design: https://felicia.hu

Comments

Want to join the conversation?

Loading comments...