AI Videos

All News Deals Social Blogs Videos Podcasts Digests

DeepSeek’s New AI Is A Game Changer

•May 22, 2026

Two Minute Papers

Two Minute Papers•May 22, 2026

Why It Matters

By cutting visual token costs and enhancing interpretability, DeepSeek’s approach makes state‑of‑the‑art vision AI affordable and accessible for enterprises and researchers alike.

Key Takeaways

•DeepSeek introduces visual‑pointing reasoning to reduce token usage.
•Model uses 90% fewer visual tokens yet matches frontier performance.
•Distillation from expert teachers enables multi‑task visual thinking.
•Open‑source blueprint promises cheaper, more interpretable AI systems.
•Limitations include cue‑dependent reasoning and reduced fine‑detail accuracy.

Summary

The video spotlights DeepSeek’s new AI architecture that replaces traditional dense visual processing with a "pointing" paradigm, allowing the system to reference specific image regions while reasoning. By treating visual primitives as tokens, the model slashes visual token consumption by roughly 90% while still delivering accuracy on par with commercial frontier models. Key technical insights include a policy‑distillation objective that trains a student model on multiple specialist teachers—one excelling at bounding‑box detection, another at maze traversal, and so forth. This multi‑expert distillation yields a single model capable of topological reasoning, visual tracing, and rapid token‑efficient inference, as demonstrated on benchmarks spanning counting, object localization, and maze navigation. The presenter highlights vivid examples: counting people by literally pointing at each individual, tracing a maze’s solution path, and visualizing the reasoning chain behind answers. The model is open‑research, free to use, and has been run on Lambda’s GPU cloud with a 671‑billion‑parameter instance, showcasing both speed and scalability. Implications are significant: reduced token usage translates to lower compute costs, while the visual‑pointing approach offers greater interpretability and easier error diagnosis. Though the system requires explicit cues and may struggle with ultra‑fine details, its open‑source blueprint could democratize high‑performance vision AI and pressure proprietary vendors to adopt more efficient, transparent designs.

Original Description

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers

📝 The paper is available here:

https://github.com/ailuntx/Thinking-with-Visual-Primitives

https://huggingface.co/datasets/NodeLinker/deepseek-ai-Thinking-with-Visual-Primitives-deleted-repo/blob/main/Thinking_with_Visual_Primitives.pdf

Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi

My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https://felicia.hu

#deepseek

Comments

Want to join the conversation?

Loading comments...