Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
By: Zeren Jiang , Chuanxia Zheng , Iro Laina and more
Potential Business Impact:
Turns regular videos into 3D moving worlds.
We introduce Geo4D, a method to repurpose video diffusion models for monocular 3D reconstruction of dynamic scenes. By leveraging the strong dynamic priors captured by large-scale pre-trained video models, Geo4D can be trained using only synthetic data while generalizing well to real data in a zero-shot manner. Geo4D predicts several complementary geometric modalities, namely point, disparity, and ray maps. We propose a new multi-modal alignment algorithm to align and fuse these modalities, as well as a sliding window approach at inference time, thus enabling robust and accurate 4D reconstruction of long videos. Extensive experiments across multiple benchmarks show that Geo4D significantly surpasses state-of-the-art video depth estimation methods.
Similar Papers
Geometry-aware 4D Video Generation for Robot Manipulation
CV and Pattern Recognition
Robots predict future movements from new angles.
Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization
Graphics
Creates realistic moving 3D objects from videos.
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
CV and Pattern Recognition
Makes one picture move and change like a video.