DeepMind’s AI Just Solved Video Generation In A Way Nobody Expected

•October 13, 2025

0

Two Minute Papers

Two Minute Papers•Oct 13, 2025

Why It Matters

Veo 3 demonstrates that AI can acquire complex visual‑physics reasoning without explicit instruction, heralding a new era of low‑cost, high‑quality video generation that could disrupt media production, simulation, and design industries.

Summary

The video spotlights DeepMind’s latest generative video model, Veo 3, which can turn a simple text prompt into high‑fidelity video. The presenter, Dr. Károly Zsolnai‑Fehér of Two Minute Papers, frames the announcement as a “game‑changing” moment for AI, noting that the system produces photorealistic motion that rivals hand‑crafted physics simulations.

Veo 3’s capabilities emerge without explicit programming: it can mix colors, simulate specular highlights, perform object‑to‑object transformations, and even handle soft‑body dynamics and refractions. The model responds to prompts such as “roll a burrito” or “turn a teacup into a mouse,” generating seamless frame‑by‑frame reasoning that the authors describe as a “chain of frames,” analogous to step‑by‑step thinking in large‑language models.

The presenter highlights striking examples—a golden spoon’s reflections staying consistent across a moving armature, a Rorschach‑style inkblot that morphs into crabs, and flawless in‑painting, out‑painting, super‑resolution, and denoising of low‑light footage. He emphasizes that these feats are not the result of engineered pipelines but emergent behavior learned from massive video corpora, likening the AI’s learning process to that of a child.

While the technology is still costly and occasionally unreliable—producing “magician‑like” errors and failing basic IQ‑style tests—the implications are profound. Veo 3 could democratize high‑end visual effects, accelerate scientific visualization, and reshape content creation pipelines, while future iterations (Veo 5 and beyond) promise even broader creative and commercial applications.

Original Description

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers

Guide:

Rent one of their GPUs with over 16GB of VRAM

Open a terminal

Just get Ollama with this command - https://ollama.com/download/linux

Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b

📝 The paper is available here:

https://video-zero-shot.github.io/

📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD

Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Sven Pfiffner, Taras Bobrovytsky, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi

If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

My research: https://cg.tuwien.ac.at/~zsolnai/

X/Twitter: https://twitter.com/twominutepapers

Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu

0

Comments

Want to join the conversation?

Loading comments...