Score: 1

MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips

Published: August 7, 2025 | arXiv ID: 2508.05506v1

By: Shibo Wang , Haonan He , Maria Parelli and more

Potential Business Impact:

Shows hidden parts of objects in videos.

Most RGB-based hand-object reconstruction methods rely on object templates, while template-free methods typically assume full object visibility. This assumption often breaks in real-world settings, where fixed camera viewpoints and static grips leave parts of the object unobserved, resulting in implausible reconstructions. To overcome this, we present MagicHOI, a method for reconstructing hands and objects from short monocular interaction videos, even under limited viewpoint variation. Our key insight is that, despite the scarcity of paired 3D hand-object data, large-scale novel view synthesis diffusion models offer rich object supervision. This supervision serves as a prior to regularize unseen object regions during hand interactions. Leveraging this insight, we integrate a novel view synthesis model into our hand-object reconstruction framework. We further align hand to object by incorporating visible contact constraints. Our results demonstrate that MagicHOI significantly outperforms existing state-of-the-art hand-object reconstruction methods. We also show that novel view synthesis diffusion priors effectively regularize unseen object regions, enhancing 3D hand-object reconstruction.

Open-world Hand-Object Interaction Video Generation Based on Structure and Contact-aware Representation

CV and Pattern Recognition

Makes videos of hands touching objects realistic.

1 Dec 2025 1

89%

Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors

Graphics

Creates realistic 3D actions from text descriptions.

25 Mar 2025 0

89%

Follow My Hold: Hand-Object Interaction Reconstruction through Geometric Guidance

CV and Pattern Recognition

Makes 3D object shapes from one picture.

25 Aug 2025 0

View PDF Login to Bookmark

Page Count

12 pages

MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips

Shows hidden parts of objects in videos.

Technical Abstract

Open-world Hand-Object Interaction Video Generation Based on Structure and Contact-aware Representation

Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors

Follow My Hold: Hand-Object Interaction Reconstruction through Geometric Guidance