TiP4GEN: Text to Immersive Panorama 4D Scene Generation
By: Ke Xing , Hanwen Liang , Dejia Xu and more
Potential Business Impact:
Creates 360-degree moving virtual worlds from text.
With the rapid advancement and widespread adoption of VR/AR technologies, there is a growing demand for the creation of high-quality, immersive dynamic scenes. However, existing generation works predominantly concentrate on the creation of static scenes or narrow perspective-view dynamic scenes, falling short of delivering a truly 360-degree immersive experience from any viewpoint. In this paper, we introduce \textbf{TiP4GEN}, an advanced text-to-dynamic panorama scene generation framework that enables fine-grained content control and synthesizes motion-rich, geometry-consistent panoramic 4D scenes. TiP4GEN integrates panorama video generation and dynamic scene reconstruction to create 360-degree immersive virtual environments. For video generation, we introduce a \textbf{Dual-branch Generation Model} consisting of a panorama branch and a perspective branch, responsible for global and local view generation, respectively. A bidirectional cross-attention mechanism facilitates comprehensive information exchange between the branches. For scene reconstruction, we propose a \textbf{Geometry-aligned Reconstruction Model} based on 3D Gaussian Splatting. By aligning spatial-temporal point clouds using metric depth maps and initializing scene cameras with estimated poses, our method ensures geometric consistency and temporal coherence for the reconstructed scenes. Extensive experiments demonstrate the effectiveness of our proposed designs and the superiority of TiP4GEN in generating visually compelling and motion-coherent dynamic panoramic scenes. Our project page is at https://ke-xing.github.io/TiP4GEN/.
Similar Papers
OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes
CV and Pattern Recognition
Creates realistic 3D worlds from 2D pictures.
Generating 360° Video is What You Need For a 3D Scene
Graphics
Creates walkable 3D worlds from text.
Matrix-3D: Omnidirectional Explorable 3D World Generation
CV and Pattern Recognition
Creates 3D worlds from one picture or words.