AI Videos

All News Deals Social Blogs Videos Podcasts Digests

Google DeepMind Robotics Lab Tour with Hannah Fry

•December 10, 2025

Google DeepMind

Google DeepMind•Dec 10, 2025

Why It Matters

DeepMind’s integration of large multimodal models into physical robots marks a turning point toward commercially viable, general‑purpose automation that can understand and act on natural language, reshaping industries from consumer services to supply‑chain operations.

Summary

In a behind‑the‑scenes tour of Google DeepMind’s robotics lab, host Hannah Fry and Director of Robotics Kanishka Rao showcase the latest generation of general‑purpose robots built on large multimodal models. The discussion frames the shift from narrowly programmed manipulators to open‑ended agents that can interpret natural language, reason about actions, and execute long‑horizon tasks. Central to this evolution are Vision‑Language‑Action (VLA) models that treat visual inputs, textual instructions, and motor commands as a unified token stream, enabling “action generalization” across novel objects and scenes.

Key technical insights include the integration of Gemini‑style large language models with robust visual backbones, allowing robots to operate without controlled lighting or privacy screens. The lab demonstrates two capabilities in the 1.5 rollout: an “agentic” component that orchestrates sequences of subtasks, and a “thinking” component that generates chain‑of‑thought style reasoning before each motion, mirroring recent advances in prompting LLMs. Demonstrations range from millimeter‑precise lunch‑box packing to dynamic object manipulation (e.g., sorting blocks, opening a pear lid) and a humanoid that sorts laundry while verbalizing its internal thoughts.

Notable moments include the robot’s ability to answer high‑level queries—such as checking the weather before packing a bag—and to adapt to completely unseen items like a stress ball or a Doritos bag, highlighting the system’s zero‑shot generalization. The researchers explain a hierarchical architecture where a reasoning‑focused ER model plans tasks and dispatches them to the VLA for execution, while some humanoid prototypes operate end‑to‑end without explicit hierarchy, directly outputting both thoughts and actions.

The implications are profound: by marrying foundation models with embodied control, DeepMind is moving toward robots that can be instructed in everyday language and perform complex, multi‑step chores without task‑specific reprogramming. This could accelerate the deployment of service robots in homes, offices, and logistics, turning what was once a research curiosity into a scalable, commercial capability.

Original Description

In this episode, we open the archives on host Hannah Fry’s visit to our California robotics lab. Filmed earlier this year, Hannah interacts with a new set of robots—those that don't just see, but think, plan, and do. Watch as the team goes behind the scenes to test the limits of generalization, challenging robots to handle unseen objects autonomously.

Learn more about our most recent models: https://deepmind.google/models/gemini-robotics/

____

Presenter: Professor Hannah Fry

Video editor: Anthony Le

Audio engineer: Perry Rogantin

Visual identity: Rob Ashley

Commissioned by Google DeepMind

Series Producer: Dan Hardoon

Editor: Rami Tzabar

Commissioner & Producer: Emma Yousif

Music composition: Eleni Shaw

Comments

Want to join the conversation?

Loading comments...