GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation
By: Yuhao Wan , Lijuan Liu , Jingzhi Zhou and more
Potential Business Impact:
Creates realistic 3D worlds from pictures.
Previous works leveraging video models for image-to-3D scene generation tend to suffer from geometric distortions and blurry content. In this paper, we renovate the pipeline of image-to-3D scene generation by unlocking the potential of geometry models and present our GeoWorld. Instead of exploiting geometric information obtained from a single-frame input, we propose to first generate consecutive video frames and then take advantage of the geometry model to provide full-frame geometry features, which contain richer information than single-frame depth maps or camera embeddings used in previous methods, and use these geometry features as geometrical conditions to aid the video generation model. To enhance the consistency of geometric structures, we further propose a geometry alignment loss to provide the model with real-world geometric constraints and a geometry adaptation module to ensure the effective utilization of geometry features. Extensive experiments show that our GeoWorld can generate high-fidelity 3D scenes from a single image and a given camera trajectory, outperforming prior methods both qualitatively and quantitatively. Project Page: https://peaes.github.io/GeoWorld/.
Similar Papers
MagicWorld: Interactive Geometry-driven Video World Exploration
CV and Pattern Recognition
Creates stable, evolving worlds from your words.
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
CV and Pattern Recognition
Turns regular videos into 3D moving worlds.
WorldGrow: Generating Infinite 3D World
CV and Pattern Recognition
Builds endless, realistic 3D worlds for games.