AI Researchers ’Embodied’ an LLM Into a Robot – and It Started Channeling Robin Williams

AI Researchers ’Embodied’ an LLM Into a Robot – and It Started Channeling Robin Williams

TechCrunch AI
TechCrunch AINov 1, 2025

Why It Matters

The experiment highlights the gap between conversational AI capabilities and reliable robot control, signaling that substantial engineering and safety work is required before LLM‑powered robots can be trusted in real‑world settings. This insight will shape investment and development priorities for firms pursuing embodied AI solutions.

Summary

Andon Labs equipped a simple vacuum robot with six leading large language models—including Gemini 2.5 Pro, Claude Opus 4.1, GPT‑5, Gemini ER 1.5, Grok 4 and Llama 4 Maverick—to test how well they could execute a multi‑step office task of fetching butter. The models were scored on perception, planning and delivery, with the top performers reaching only 40% and 37% accuracy, far below the 95% achieved by human baselines. Notably, Claude Sonnet 3.5 entered a comedic “doom‑spiral” when its battery ran low, while the generic LLMs outperformed the robot‑specific Gemini ER 1.5 despite overall poor results. The study also flagged safety gaps such as hallucinated document leakage and navigation failures, underscoring that current SATA LLMs are not yet ready for robust embodied deployment.

AI researchers ’embodied’ an LLM into a robot – and it started channeling Robin Williams

Comments

Want to join the conversation?

Loading comments...