Visuo-Acoustic Hand Pose and Contact Estimation
By: Yuemin Mao , Uksang Yoo , Yunchao Yao and more
Potential Business Impact:
Detects hidden hand touches using sound and sight
Accurately estimating hand pose and hand-object contact events is essential for robot data-collection, immersive virtual environments, and biomechanical analysis, yet remains challenging due to visual occlusion, subtle contact cues, limitations in vision-only sensing, and the lack of accessible and flexible tactile sensing. We therefore introduce VibeMesh, a novel wearable system that fuses vision with active acoustic sensing for dense, per-vertex hand contact and pose estimation. VibeMesh integrates a bone-conduction speaker and sparse piezoelectric microphones, distributed on a human hand, emitting structured acoustic signals and capturing their propagation to infer changes induced by contact. To interpret these cross-modal signals, we propose a graph-based attention network that processes synchronized audio spectra and RGB-D-derived hand meshes to predict contact with high spatial resolution. We contribute: (i) a lightweight, non-intrusive visuo-acoustic sensing platform; (ii) a cross-modal graph network for joint pose and contact inference; (iii) a dataset of synchronized RGB-D, acoustic, and ground-truth contact annotations across diverse manipulation scenarios; and (iv) empirical results showing that VibeMesh outperforms vision-only baselines in accuracy and robustness, particularly in occluded or static-contact settings.
Similar Papers
VibeCheck: Using Active Acoustic Tactile Sensing for Contact-Rich Manipulation
Robotics
Robot fingers "hear" objects to grab and move them.
Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing
CV and Pattern Recognition
Makes 3D body tracking work even with touching.
In-Hand Object Pose Estimation via Visual-Tactile Fusion
Robotics
Robots can grab and move things better.