VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
By: Sixiao Zheng , Minghao Yin , Wenbo Hu and more
Potential Business Impact:
Creates realistic videos with controllable objects and cameras.
Video world models aim to simulate dynamic, real-world environments, yet existing methods struggle to provide unified and precise control over camera and multi-object motion, as videos inherently operate dynamics in the projected 2D image plane. To bridge this gap, we introduce VerseCrafter, a 4D-aware video world model that enables explicit and coherent control over both camera and object dynamics within a unified 4D geometric world state. Our approach is centered on a novel 4D Geometric Control representation, which encodes the world state through a static background point cloud and per-object 3D Gaussian trajectories. This representation captures not only an object's path but also its probabilistic 3D occupancy over time, offering a flexible, category-agnostic alternative to rigid bounding boxes or parametric models. These 4D controls are rendered into conditioning signals for a pretrained video diffusion model, enabling the generation of high-fidelity, view-consistent videos that precisely adhere to the specified dynamics. Unfortunately, another major challenge lies in the scarcity of large-scale training data with explicit 4D annotations. We address this by developing an automatic data engine that extracts the required 4D controls from in-the-wild videos, allowing us to train our model on a massive and diverse dataset.
Similar Papers
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
CV and Pattern Recognition
Makes computers understand real-world videos like humans.
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling
CV and Pattern Recognition
Makes computers understand real-world videos better.
TeleWorld: Towards Dynamic Multimodal Synthesis with a 4D World Model
CV and Pattern Recognition
AI learns to remember and interact with changing worlds.