LLMs Help Robots Understand Vague Instructions and Focus on Key Details
Why It Matters
Automating instruction disambiguation reduces robot training labor and boosts safety in shared workspaces, accelerating collaborative‑robot adoption across manufacturing, logistics, and office environments.
Key Takeaways
- •Masked IRL reduces demonstration data by ~5× versus prior methods
- •Robots identify task‑relevant details with 15% higher accuracy
- •Two LLMs: one clarifies language, another masks irrelevant features
- •Real‑world tests show safe navigation around laptops and humans after 50 demos
- •Future work adds camera vision for dynamic masking of nearby objects
Pulse Analysis
Robots have long struggled with vague human commands, often requiring exhaustive step‑by‑step programming or large libraries of demonstrations. Recent advances in large language models (LLMs) provide a new avenue for interpreting natural language, but translating that understanding into precise motion plans remains a challenge. MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) tackled this gap with Masked Inverse Reinforcement Learning (Masked IRL), a dual‑LLM pipeline that first expands ambiguous prompts into detailed instructions and then masks out environmental factors deemed irrelevant to the task. By coupling language clarification with a binary relevance scoring system, the method teaches robots which details truly matter, dramatically streamlining the learning process.
In controlled experiments, Masked IRL achieved a five‑fold reduction in required kinesthetic demonstrations while boosting preference‑recognition accuracy by 15% over leading baselines. The system’s two‑LLM architecture—one for language elaboration, another for relevance masking—enabled a robotic arm to safely navigate around obstacles such as laptops and human coworkers after only 50 guided demos. Real‑world trials showcased the robot delivering a coffee mug, wiping a table, and handing over a snack, all while respecting nuanced constraints like “stay away from the laptop.” These results highlight the practical safety and efficiency gains that arise when robots can infer unstated user intent.
The implications for industry are significant. Lower training overhead and heightened safety lower barriers for deploying collaborative robots in warehouses, offices, and factories, where human‑robot interaction is increasingly common. CSAIL’s roadmap includes adding visual perception, allowing robots to dynamically mask irrelevant objects captured by onboard cameras, further reducing the need for pre‑programmed knowledge. As LLM‑driven instruction disambiguation matures, businesses can expect faster integration cycles, reduced labor costs, and broader adoption of autonomous agents that seamlessly adapt to real‑world nuances.
LLMs help robots understand vague instructions and focus on key details
Comments
Want to join the conversation?
Loading comments...