ChronosObserver: Taming 4D World with Hyperspace Diffusion Sampling
By: Qisen Wang , Yifan Zhao , Peisen Shen and more
Potential Business Impact:
Creates realistic 3D videos from different angles.
Although prevailing camera-controlled video generation models can produce cinematic results, lifting them directly to the generation of 3D-consistent and high-fidelity time-synchronized multi-view videos remains challenging, which is a pivotal capability for taming 4D worlds. Some works resort to data augmentation or test-time optimization, but these strategies are constrained by limited model generalization and scalability issues. To this end, we propose ChronosObserver, a training-free method including World State Hyperspace to represent the spatiotemporal constraints of a 4D world scene, and Hyperspace Guided Sampling to synchronize the diffusion sampling trajectories of multiple views using the hyperspace. Experimental results demonstrate that our method achieves high-fidelity and 3D-consistent time-synchronized multi-view videos generation without training or fine-tuning for diffusion models.
Similar Papers
BulletTime: Decoupled Control of Time and Camera Pose for Video Generation
CV and Pattern Recognition
Lets you change what happens and where the camera looks.
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
CV and Pattern Recognition
Creates 3D worlds from a single picture.
OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
CV and Pattern Recognition
Makes videos from any angle, time, or text.