Spatia: Video Generation with Updatable Spatial Memory
By: Jinjing Zhao , Fangyun Wei , Zhening Liu and more
Potential Business Impact:
Makes videos stay the same over time.
Existing video generation models struggle to maintain long-term spatial and temporal consistency due to the dense, high-dimensional nature of video signals. To overcome this limitation, we propose Spatia, a spatial memory-aware video generation framework that explicitly preserves a 3D scene point cloud as persistent spatial memory. Spatia iteratively generates video clips conditioned on this spatial memory and continuously updates it through visual SLAM. This dynamic-static disentanglement design enhances spatial consistency throughout the generation process while preserving the model's ability to produce realistic dynamic entities. Furthermore, Spatia enables applications such as explicit camera control and 3D-aware interactive editing, providing a geometrically grounded framework for scalable, memory-driven video generation.
Similar Papers
Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation
CV and Pattern Recognition
Teaches computers to understand space from videos.
SPATIALGEN: Layout-guided 3D Indoor Scene Generation
CV and Pattern Recognition
Builds realistic 3D rooms from pictures.
Generative Spatiotemporal Data Augmentation
CV and Pattern Recognition
Makes computer vision work better with less data.