Learning to Align and Refine: A Foundation-to-Diffusion Framework for Occlusion-Robust Two-Hand Reconstruction
By: Gaoge Han , Yongkang Cheng , Zhe Chen and more
Potential Business Impact:
Fixes computer drawings of hands that overlap.
Two-hand reconstruction from monocular images faces persistent challenges due to complex and dynamic hand postures and occlusions, causing significant difficulty in achieving plausible interaction alignment. Existing approaches struggle with such alignment issues, often resulting in misalignment and penetration artifacts. To tackle this, we propose a dual-stage Foundation-to-Diffusion framework that precisely align 2D prior guidance from vision foundation models and diffusion-based generative 3D interaction refinement to achieve occlusion-robust two-hand reconstruction. First, we introduce a lightweight fusion alignment encoder that aligns fused multimodal 2D priors like key points, segmentation maps, and depth cues from vision foundation models during training. This provides robust structured guidance, further enabling efficient inference without heavy foundation model encoders at test time while maintaining high reconstruction accuracy. Second, we implement a two-hand diffusion model explicitly trained to convert interpenetrated 3D poses into plausible, penetration-free counterparts. Through collision gradient-guided denoising, the model rectifies artifacts while preserving natural spatial relationships between hands. Extensive evaluations demonstrate that our method achieves state-of-the-art performance on InterHand2.6M, HIC, and FreiHAND datasets, significantly advancing occlusion handling and interaction robustness. Our code will be publicly released.
Similar Papers
Follow My Hold: Hand-Object Interaction Reconstruction through Geometric Guidance
CV and Pattern Recognition
Makes 3D object shapes from one picture.
Diffusion-based 3D Hand Motion Recovery with Intuitive Physics
CV and Pattern Recognition
Makes computer hands move realistically when touching things.
3D Hand Mesh-Guided AI-Generated Malformed Hand Refinement with Hand Pose Transformation via Diffusion Model
CV and Pattern Recognition
Fixes weird computer-drawn hands in pictures.