Few-Shot Inference of Human Perceptions of Robot Performance in Social Navigation Scenarios
By: Qiping Zhang , Nathan Tsoi , Mofeed Nagib and more
Understanding how humans evaluate robot behavior during human-robot interactions is crucial for developing socially aware robots that behave according to human expectations. While the traditional approach to capturing these evaluations is to conduct a user study, recent work has proposed utilizing machine learning instead. However, existing data-driven methods require large amounts of labeled data, which limits their use in practice. To address this gap, we propose leveraging the few-shot learning capabilities of Large Language Models (LLMs) to improve how well a robot can predict a user's perception of its performance, and study this idea experimentally in social navigation tasks. To this end, we extend the SEAN TOGETHER dataset with additional real-world human-robot navigation episodes and participant feedback. Using this augmented dataset, we evaluate the ability of several LLMs to predict human perceptions of robot performance from a small number of in-context examples, based on observed spatio-temporal cues of the robot and surrounding human motion. Our results demonstrate that LLMs can match or exceed the performance of traditional supervised learning models while requiring an order of magnitude fewer labeled instances. We further show that prediction performance can improve with more in-context examples, confirming the scalability of our approach. Additionally, we investigate what kind of sensor-based information an LLM relies on to make these inferences by conducting an ablation study on the input features considered for performance prediction. Finally, we explore the novel application of personalized examples for in-context learning, i.e., drawn from the same user being evaluated, finding that they further enhance prediction accuracy. This work paves the path to improving robot behavior in a scalable manner through user-centered feedback.
Similar Papers
Few-shot Vision-based Human Activity Recognition with MLLM-based Visual Reinforcement Learning
Robotics
Teaches computers to recognize actions from few pictures.
Building Knowledge from Interactions: An LLM-Based Architecture for Adaptive Tutoring and Social Reasoning
Robotics
Robots learn to teach and remember like humans.
Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs
Artificial Intelligence
Helps robots understand places better to find their way.