EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh
By: Tao Hu , Haoyang Peng , Xiao Liu and more
Potential Business Impact:
Makes videos look real from any angle.
Generating high-quality camera-controllable videos from monocular input is a challenging task, particularly under extreme viewpoint. Existing methods often struggle with geometric inconsistencies and occlusion artifacts in boundaries, leading to degraded visual quality. In this paper, we introduce EX-4D, a novel framework that addresses these challenges through a Depth Watertight Mesh representation. The representation serves as a robust geometric prior by explicitly modeling both visible and occluded regions, ensuring geometric consistency in extreme camera pose. To overcome the lack of paired multi-view datasets, we propose a simulated masking strategy that generates effective training data only from monocular videos. Additionally, a lightweight LoRA-based video diffusion adapter is employed to synthesize high-quality, physically consistent, and temporally coherent videos. Extensive experiments demonstrate that EX-4D outperforms state-of-the-art methods in terms of physical consistency and extreme-view quality, enabling practical 4D video generation.
Similar Papers
Mesh4D: 4D Mesh Reconstruction and Tracking from Monocular Video
CV and Pattern Recognition
Creates 3D models of moving things from videos.
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
CV and Pattern Recognition
Turns regular videos into 3D moving worlds.
Computer Vision and Deep Learning for 4D Augmented Reality
CV and Pattern Recognition
Makes 3D videos work better in virtual reality.