VM-BHINet:Vision Mamba Bimanual Hand Interaction Network for 3D Interacting Hand Mesh Recovery From a Single RGB Image
By: Han Bi , Ge Yu , Yu He and more
Potential Business Impact:
Makes computer models of hands more real.
Understanding bimanual hand interactions is essential for realistic 3D pose and shape reconstruction. However, existing methods struggle with occlusions, ambiguous appearances, and computational inefficiencies. To address these challenges, we propose Vision Mamba Bimanual Hand Interaction Network (VM-BHINet), introducing state space models (SSMs) into hand reconstruction to enhance interaction modeling while improving computational efficiency. The core component, Vision Mamba Interaction Feature Extraction Block (VM-IFEBlock), combines SSMs with local and global feature operations, enabling deep understanding of hand interactions. Experiments on the InterHand2.6M dataset show that VM-BHINet reduces Mean per-joint position error (MPJPE) and Mean per-vertex position error (MPVPE) by 2-3%, significantly surpassing state-of-the-art methods.
Similar Papers
DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in Interactions
CV and Pattern Recognition
Helps computers see hidden hand parts better.
ViTaMIn-B: A Reliable and Efficient Visuo-Tactile Bimanual Manipulation Interface
Robotics
Helps robots learn to do tricky tasks with their hands.
BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting
CV and Pattern Recognition
Shows how two hands hold anything in 3D.