Google DeepMind Robotics Lab Tour with Hannah Fry

Google DeepMind
Google DeepMindDec 10, 2025

Why It Matters

DeepMind’s integration of large multimodal models into physical robots marks a turning point toward commercially viable, general‑purpose automation that can understand and act on natural language, reshaping industries from consumer services to supply‑chain operations.

Summary

In a behind‑the‑scenes tour of Google DeepMind’s robotics lab, host Hannah Fry and Director of Robotics Kanishka Rao showcase the latest generation of general‑purpose robots built on large multimodal models. The discussion frames the shift from narrowly programmed manipulators to open‑ended agents that can interpret natural language, reason about actions, and execute long‑horizon tasks. Central to this evolution are Vision‑Language‑Action (VLA) models that treat visual inputs, textual instructions, and motor commands as a unified token stream, enabling “action generalization” across novel objects and scenes.

Key technical insights include the integration of Gemini‑style large language models with robust visual backbones, allowing robots to operate without controlled lighting or privacy screens. The lab demonstrates two capabilities in the 1.5 rollout: an “agentic” component that orchestrates sequences of subtasks, and a “thinking” component that generates chain‑of‑thought style reasoning before each motion, mirroring recent advances in prompting LLMs. Demonstrations range from millimeter‑precise lunch‑box packing to dynamic object manipulation (e.g., sorting blocks, opening a pear lid) and a humanoid that sorts laundry while verbalizing its internal thoughts.

Notable moments include the robot’s ability to answer high‑level queries—such as checking the weather before packing a bag—and to adapt to completely unseen items like a stress ball or a Doritos bag, highlighting the system’s zero‑shot generalization. The researchers explain a hierarchical architecture where a reasoning‑focused ER model plans tasks and dispatches them to the VLA for execution, while some humanoid prototypes operate end‑to‑end without explicit hierarchy, directly outputting both thoughts and actions.

The implications are profound: by marrying foundation models with embodied control, DeepMind is moving toward robots that can be instructed in everyday language and perform complex, multi‑step chores without task‑specific reprogramming. This could accelerate the deployment of service robots in homes, offices, and logistics, turning what was once a research curiosity into a scalable, commercial capability.

Original Description

In this episode, we open the archives on host Hannah Fry’s visit to our California robotics lab. Filmed earlier this year, Hannah interacts with a new set of robots—those that don't just see, but think, plan, and do. Watch as the team goes behind the scenes to test the limits of generalization, challenging robots to handle unseen objects autonomously.
Learn more about our most recent models: https://deepmind.google/models/gemini-robotics/
____
Presenter: Professor Hannah Fry
Video editor: Anthony Le
Audio engineer: Perry Rogantin
Visual identity: Rob Ashley
Commissioned by Google DeepMind
Series Producer: Dan Hardoon
Editor: Rami Tzabar
Commissioner & Producer: Emma Yousif
Music composition: Eleni Shaw

Comments

Want to join the conversation?

Loading comments...