WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling
By: Shaoheng Fang , Hanwen Jiang , Yunpeng Bai and more
Potential Business Impact:
Creates realistic videos that stay the same over time.
Recent video generators achieve striking photorealism, yet remain fundamentally inconsistent in 3D. We present WorldReel, a 4D video generator that is natively spatio-temporally consistent. WorldReel jointly produces RGB frames together with 4D scene representations, including pointmaps, camera trajectory, and dense flow mapping, enabling coherent geometry and appearance modeling over time. Our explicit 4D representation enforces a single underlying scene that persists across viewpoints and dynamic content, yielding videos that remain consistent even under large non-rigid motion and significant camera movement. We train WorldReel by carefully combining synthetic and real data: synthetic data providing precise 4D supervision (geometry, motion, and camera), while real videos contribute visual diversity and realism. This blend allows WorldReel to generalize to in-the-wild footage while preserving strong geometric fidelity. Extensive experiments demonstrate that WorldReel sets a new state-of-the-art for consistent video generation with dynamic scenes and moving cameras, improving metrics of geometric consistency, motion coherence, and reducing view-time artifacts over competing methods. We believe that WorldReel brings video generation closer to 4D-consistent world modeling, where agents can render, interact, and reason about scenes through a single and stable spatiotemporal representation.
Similar Papers
GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation
CV and Pattern Recognition
Creates realistic 3D worlds from pictures.
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
CV and Pattern Recognition
Makes one picture move and change like a video.
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
CV and Pattern Recognition
Turns regular videos into 3D moving worlds.