Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody
By: David Sasu , Kweku Andoh Yamoah , Benedict Quartey and more
Potential Business Impact:
Robots understand your voice commands better.
Enabling robots to accurately interpret and execute spoken language instructions is essential for effective human-robot collaboration. Traditional methods rely on speech recognition to transcribe speech into text, often discarding crucial prosodic cues needed for disambiguating intent. We propose a novel approach that directly leverages speech prosody to infer and resolve instruction intent. Predicted intents are integrated into large language models via in-context learning to disambiguate and select appropriate task plans. Additionally, we present the first ambiguous speech dataset for robotics, designed to advance research in speech disambiguation. Our method achieves 95.79% accuracy in detecting referent intents within an utterance and determines the intended task plan of ambiguous instructions with 71.96% accuracy, demonstrating its potential to significantly improve human-robot communication.
Similar Papers
Affordance-Based Disambiguation of Surgical Instructions for Collaborative Robot-Assisted Surgery
Robotics
Robot surgeon understands doctor's spoken commands better.
Affordance-Based Disambiguation of Surgical Instructions for Collaborative Robot-Assisted Surgery
Robotics
Robot helps surgeons by understanding their spoken words.
ProVox: Personalization and Proactive Planning for Situated Human-Robot Collaboration
Robotics
Robot learns your needs, helps you faster.