GesPrompt: Leveraging Co-Speech Gestures to Augment LLM-Based Interaction in Virtual Reality
By: Xiyun Hu , Dizhi Ma , Fengming He and more
Potential Business Impact:
Lets you talk to computers using hands and voice.
Large Language Model (LLM)-based copilots have shown great potential in Extended Reality (XR) applications. However, the user faces challenges when describing the 3D environments to the copilots due to the complexity of conveying spatial-temporal information through text or speech alone. To address this, we introduce GesPrompt, a multimodal XR interface that combines co-speech gestures with speech, allowing end-users to communicate more naturally and accurately with LLM-based copilots in XR environments. By incorporating gestures, GesPrompt extracts spatial-temporal reference from co-speech gestures, reducing the need for precise textual prompts and minimizing cognitive load for end-users. Our contributions include (1) a workflow to integrate gesture and speech input in the XR environment, (2) a prototype VR system that implements the workflow, and (3) a user study demonstrating its effectiveness in improving user communication in VR environments.
Similar Papers
Large Language Models for Virtual Human Gesture Selection
Human-Computer Interaction
Makes virtual characters gesture naturally when they talk.
Multimodal "Puppeteer": An Exploration of Robot Teleoperation Via Virtual Counterpart with LLM-Driven Voice and Gesture Interaction in Augmented Reality
Human-Computer Interaction
Control robots with your voice and hands.
Improving Cooperation in Collaborative Embodied AI
Artificial Intelligence
AI agents work together better using smart instructions.