Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models
By: Xijie Zhang, Fengliang He, Hong-Ning Dai
Potential Business Impact:
Lets VR understand your hand waves without training.
Natural and efficient interaction remains a critical challenge for virtual reality and augmented reality (VR/AR) systems. Vision-based gesture recognition suffers from high computational cost, sensitivity to lighting conditions, and privacy leakage concerns. Acoustic sensing provides an attractive alternative: by emitting inaudible high-frequency signals and capturing their reflections, channel impulse response (CIR) encodes how gestures perturb the acoustic field in a low-cost and user-transparent manner. However, existing CIR-based gesture recognition methods often rely on extensive training of models on large labeled datasets, making them unsuitable for few-shot VR scenarios. In this work, we propose the first framework that leverages large language models (LLMs) for CIR-based gesture recognition in VR/AR systems. Despite LLMs' strengths, it is non-trivial to achieve few-shot and zero-shot learning of CIR gestures due to their inconspicuous features. To tackle this challenge, we collect differential CIR rather than original CIR data. Moreover, we construct a real-world dataset collected from 10 participants performing 15 gestures across three categories (digits, letters, and shapes), with 10 repetitions each. We then conduct extensive experiments on this dataset using an LLM-adopted classifier. Results show that our LLM-based framework achieves accuracy comparable to classical machine learning baselines, while requiring no domain-specific retraining.
Similar Papers
Rhythm in the Air: Vision-based Real-Time Music Generation through Gestures
Multimedia
Lets you make music by waving your hands.
Accessible Gesture-Driven Augmented Reality Interaction System
Human-Computer Interaction
Lets people with weak hands control games with gestures.
Natural Multimodal Fusion-Based Human-Robot Interaction: Application With Voice and Deictic Posture via Large Language Model
Robotics
Robots understand what you want by voice and pointing.