Enhancing Monocular 3D Hand Reconstruction with Learned Texture Priors
By: Giorgos Karvounas , Nikolaos Kyriazis , Iason Oikonomidis and more
Potential Business Impact:
Makes computer hands look more real and accurate.
We revisit the role of texture in monocular 3D hand reconstruction, not as an afterthought for photorealism, but as a dense, spatially grounded cue that can actively support pose and shape estimation. Our observation is simple: even in high-performing models, the overlay between predicted hand geometry and image appearance is often imperfect, suggesting that texture alignment may be an underused supervisory signal. We propose a lightweight texture module that embeds per-pixel observations into UV texture space and enables a novel dense alignment loss between predicted and observed hand appearances. Our approach assumes access to a differentiable rendering pipeline and a model that maps images to 3D hand meshes with known topology, allowing us to back-project a textured hand onto the image and perform pixel-based alignment. The module is self-contained and easily pluggable into existing reconstruction pipelines. To isolate and highlight the value of texture-guided supervision, we augment HaMeR, a high-performing yet unadorned transformer architecture for 3D hand pose estimation. The resulting system improves both accuracy and realism, demonstrating the value of appearance-guided alignment in hand reconstruction.
Similar Papers
A Scalable Attention-Based Approach for Image-to-3D Texture Mapping
CV and Pattern Recognition
Makes 3D objects look real from one picture.
Follow My Hold: Hand-Object Interaction Reconstruction through Geometric Guidance
CV and Pattern Recognition
Makes 3D object shapes from one picture.
Improving Multi-View Reconstruction via Texture-Guided Gaussian-Mesh Joint Optimization
CV and Pattern Recognition
Creates realistic 3D models from pictures.