Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration
By: Debasmita Ghose , Oz Gitelson , Marynel Vazquez and more
Potential Business Impact:
Robot understands what you want by watching and listening.
To collaborate with humans, robots must infer goals that are often ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action-Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects supportive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes.
Similar Papers
I've Changed My Mind: Robots Adapting to Changing Human Goals during Collaboration
Robotics
Robot learns your changing plans to help you faster.
10 Open Challenges Steering the Future of Vision-Language-Action Models
Robotics
Robots learn to follow spoken commands and act.
LatBot: Distilling Universal Latent Actions for Vision-Language-Action Models
Robotics
Teaches robots to do new jobs with little practice.