WORLDMEM: Long-term Consistent World Simulation with Memory
By: Zeqi Xiao , Yushi Lan , Yifan Zhou and more
Potential Business Impact:
Lets virtual worlds remember and change over time.
World simulation has gained increasing popularity due to its ability to model virtual environments and predict the consequences of actions. However, the limited temporal context window often leads to failures in maintaining long-term consistency, particularly in preserving 3D spatial consistency. In this work, we present WorldMem, a framework that enhances scene generation with a memory bank consisting of memory units that store memory frames and states (e.g., poses and timestamps). By employing a memory attention mechanism that effectively extracts relevant information from these memory frames based on their states, our method is capable of accurately reconstructing previously observed scenes, even under significant viewpoint or temporal gaps. Furthermore, by incorporating timestamps into the states, our framework not only models a static world but also captures its dynamic evolution over time, enabling both perception and interaction within the simulated world. Extensive experiments in both virtual and real scenarios validate the effectiveness of our approach.
Similar Papers
Video World Models with Long-term Spatial Memory
CV and Pattern Recognition
Keeps computer-made videos consistent over time.
Learning 3D Persistent Embodied World Models
CV and Pattern Recognition
Lets robots remember and plan for the future.
WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling
CV and Pattern Recognition
Lets computers imagine future video scenes better.