Score: 0

Zero-shot Reconstruction of In-Scene Object Manipulation from Video

Published: December 22, 2025 | arXiv ID: 2512.19684v1

By: Dixuan Lin , Tianyou Wang , Zhuoyang Pan and more

We build the first system to address the problem of reconstructing in-scene object manipulation from a monocular RGB video. It is challenging due to ill-posed scene reconstruction, ambiguous hand-object depth, and the need for physically plausible interactions. Existing methods operate in hand centric coordinates and ignore the scene, hindering metric accuracy and practical use. In our method, we first use data-driven foundation models to initialize the core components, including the object mesh and poses, the scene point cloud, and the hand poses. We then apply a two-stage optimization that recovers a complete hand-object motion from grasping to interaction, which remains consistent with the scene information observed in the input video.

Prior-Enhanced Gaussian Splatting for Dynamic Scene Reconstruction from Casual Video

CV and Pattern Recognition

Makes videos look real, like you're there.

12 Dec 2025 2

90%

Follow My Hold: Hand-Object Interaction Reconstruction through Geometric Guidance

CV and Pattern Recognition

Makes 3D object shapes from one picture.

25 Aug 2025 0

89%

Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints

CV and Pattern Recognition

Helps robots see and grab objects better.

4 Dec 2025 0

View PDF Login to Bookmark

Zero-shot Reconstruction of In-Scene Object Manipulation from Video

Technical Abstract

Prior-Enhanced Gaussian Splatting for Dynamic Scene Reconstruction from Casual Video

Follow My Hold: Hand-Object Interaction Reconstruction through Geometric Guidance

Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints