WATCH: World-aware Allied Trajectory and pose reconstruction for Camera and Human
By: Qijun Ying , Zhongyuan Hu , Rui Zhang and more
Potential Business Impact:
Makes videos show people moving in 3D space.
Global human motion reconstruction from in-the-wild monocular videos is increasingly demanded across VR, graphics, and robotics applications, yet requires accurate mapping of human poses from camera to world coordinates-a task challenged by depth ambiguity, motion ambiguity, and the entanglement between camera and human movements. While human-motion-centric approaches excel in preserving motion details and physical plausibility, they suffer from two critical limitations: insufficient exploitation of camera orientation information and ineffective integration of camera translation cues. We present WATCH (World-aware Allied Trajectory and pose reconstruction for Camera and Human), a unified framework addressing both challenges. Our approach introduces an analytical heading angle decomposition technique that offers superior efficiency and extensibility compared to existing geometric methods. Additionally, we design a camera trajectory integration mechanism inspired by world models, providing an effective pathway for leveraging camera translation information beyond naive hard-decoding approaches. Through experiments on in-the-wild benchmarks, WATCH achieves state-of-the-art performance in end-to-end trajectory reconstruction. Our work demonstrates the effectiveness of jointly modeling camera-human motion relationships and offers new insights for addressing the long-standing challenge of camera translation integration in global human motion reconstruction. The code will be available publicly.
Similar Papers
RealisMotion: Decomposed Human Motion Control and Video Generation in the World Space
CV and Pattern Recognition
Lets you make videos of anyone doing anything.
SHARE: Scene-Human Aligned Reconstruction
CV and Pattern Recognition
Puts people in 3D worlds accurately from videos.
LookOut: Real-World Humanoid Egocentric Navigation
CV and Pattern Recognition
Helps robots and computers understand where you're looking.