LLM-Handover: Exploiting LLMs for Task-Oriented Handovers

ETH Zürich Robotic Systems Lab
ETH Zürich Robotic Systems LabMay 29, 2026

Why It Matters

Linking language understanding with physical manipulation lets robots hand over objects intuitively, cutting setup time and error in collaborative workspaces.

Key Takeaways

  • LLM-Handover merges language models with part segmentation for grasp planning
  • Achieves 83% zero‑shot success across varied post‑handover tasks
  • Users preferred its handovers in 86% of study trials
  • Enables context‑aware grasps based on natural‑language task descriptions
  • Demonstrates scalable, language‑driven robot assistance without task‑specific training

Pulse Analysis

Human‑robot collaboration hinges on smooth handovers, yet traditional robotic systems often ignore the downstream use of an object, leading to awkward grips or the need for re‑grasping. Researchers have long sought ways to embed task intent into grasp planning, typically relying on pre‑programmed rules or extensive task‑specific training data. The emergence of large language models (LLMs) offers a new avenue: they can interpret natural‑language instructions and reason about object parts, bridging the semantic gap between a human’s goal and a robot’s motion planning.

LLM-Handover capitalizes on this capability by feeding an RGB‑D image and a textual task description into an LLM that identifies the most relevant object parts for the upcoming action. Coupled with a part‑segmentation network, the system selects grasps that preserve functional features—such as a mug’s handle for pouring or a screwdriver’s tip for screwing. In zero‑shot hardware trials across a range of everyday tasks, the approach delivered an 83% success rate, outperforming baseline methods that lack contextual reasoning. A complementary user study revealed an 86% preference for LLM‑driven handovers, underscoring the perceived naturalness and efficiency of the method.

The implications extend beyond laboratory demos. Industries such as manufacturing, logistics, and healthcare increasingly deploy collaborative robots (cobots) that must hand tools, components, or medical supplies to human workers. By allowing operators to simply state the intended use—"hand me the wrench for tightening"—LLM-Handover reduces programming overhead and accelerates deployment. Moreover, its zero‑shot nature suggests scalability to new objects and tasks without retraining, a key factor for fast‑moving production lines. As LLMs continue to improve, we can expect richer multimodal reasoning that further blurs the line between language and physical interaction, paving the way for truly intuitive robot assistants.

Original Description

Effective human-robot collaboration depends on task-oriented handovers, where robots present objects in ways that support the partners' intended use. To address this problem, we propose LLM-Handover, a novel framework that integrates large language model (LLM)-based reasoning with part segmentation to enable context-aware grasp selection and execution. Given an RGB-D image and a task description, our system infers relevant object parts and selects grasps that optimize post-handover usability. We show that LLM-Handover achieves higher grasp success rates and adapts better to post-handover task constraints. During hardware experiments, we achieve a success rate of 83% in a zero-shot setting over a variety of post-handover tasks. Finally, our user study underlines that our method enables more intuitive, context-aware handovers, with participants preferring it in 86% of cases.
For detailed information, check our paper accepted to the Robotics and Automation Letters (RA-L)

Comments

Want to join the conversation?

Loading comments...