AI Videos
  • All Technology
  • AI
  • Autonomy
  • B2B Growth
  • Big Data
  • BioTech
  • ClimateTech
  • Consumer Tech
  • Crypto
  • Cybersecurity
  • DevOps
  • Digital Marketing
  • Ecommerce
  • EdTech
  • Enterprise
  • FinTech
  • GovTech
  • Hardware
  • HealthTech
  • HRTech
  • LegalTech
  • Nanotech
  • PropTech
  • Quantum
  • Robotics
  • SaaS
  • SpaceTech
AllNewsDealsSocialBlogsVideosPodcastsDigests

AI Pulse

EMAIL DIGESTS

Daily

Every morning

Weekly

Tuesday recap

NewsDealsSocialBlogsVideosPodcasts
HomeTechnologyAIVideosHow DeepMind’s New AI Predicts What It Cannot See
AIScience

How DeepMind’s New AI Predicts What It Cannot See

•March 7, 2026
0
Two Minute Papers
Two Minute Papers•Mar 7, 2026

Why It Matters

By collapsing multiple 3‑D reconstruction stages into a single, fast model, D4RT enables scalable, real‑time digital‑world generation, opening new revenue streams for AR/VR, gaming and autonomous‑system developers.

Key Takeaways

  • •D4RT reconstructs 4D scenes from single video using one transformer
  • •Model simultaneously estimates depth, motion, and camera pose without separate networks
  • •Handles occlusions by predicting unseen points from temporal context
  • •Achieves up to 300× speedup over prior Gaussian splat methods
  • •Output is point cloud, limiting direct use for rendering or physics

Summary

DeepMind’s new D4RT system pushes 4‑dimensional scene reconstruction from ordinary video into the mainstream, turning a 2‑D clip into a dynamic point‑cloud that captures depth, motion and time.

Unlike earlier pipelines that stitched together separate depth, optical‑flow and pose networks and relied on costly test‑time optimization, D4RT uses a single transformer to infer geometry, motion and camera parameters in one pass. The model learns to predict occluded points by leveraging temporal context, and benchmarks show up to 300× faster processing than Gaussian‑splat approaches.

The paper demonstrates vivid examples—judo matches and fast‑moving objects—where the AI fills in hidden surfaces as they disappear behind obstacles. Researchers from DeepMind, UCL and Oxford note that the output remains a raw point cloud, which cannot be directly 3‑D printed or used for photorealistic rendering without a meshing step, and editing tools like Blender still struggle with it.

If integrated with downstream mesh‑generation or physics pipelines, D4RT could accelerate AR/VR content creation, robotics perception and virtual production, lowering the cost and time of high‑fidelity digital twins. Its open‑source release also signals a shift toward unified, real‑time 3‑D capture solutions for industry.

Original Description

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers
📝 The paper is available here:
https://d4rt-paper.github.io/
Our Gaussian Material Synthesis paper:
https://users.cg.tuwien.ac.at/zsolnai/gfx/gaussian-material-synthesis/
Tweet link: https://x.com/GoogleDeepMind/status/2014352808426807527
Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi
My research: https://cg.tuwien.ac.at/~zsolnai/
0

Comments

Want to join the conversation?

Loading comments...