Score: 1

Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody

Published: June 1, 2025 | arXiv ID: 2506.02057v1

By: David Sasu , Kweku Andoh Yamoah , Benedict Quartey and more

Potential Business Impact:

Robots understand your voice commands better.

Business Areas:
Semantic Search Internet Services

Enabling robots to accurately interpret and execute spoken language instructions is essential for effective human-robot collaboration. Traditional methods rely on speech recognition to transcribe speech into text, often discarding crucial prosodic cues needed for disambiguating intent. We propose a novel approach that directly leverages speech prosody to infer and resolve instruction intent. Predicted intents are integrated into large language models via in-context learning to disambiguate and select appropriate task plans. Additionally, we present the first ambiguous speech dataset for robotics, designed to advance research in speech disambiguation. Our method achieves 95.79% accuracy in detecting referent intents within an utterance and determines the intended task plan of ambiguous instructions with 71.96% accuracy, demonstrating its potential to significantly improve human-robot communication.

Country of Origin
πŸ‡©πŸ‡° πŸ‡ΊπŸ‡Έ United States, Denmark

Page Count
5 pages

Category
Computer Science:
Robotics