Splat4D: Diffusion-Enhanced 4D Gaussian Splatting for Temporally and Spatially Consistent Content Creation
By: Minghao Yin , Yukang Cao , Songyou Peng and more
Potential Business Impact:
Makes 3D videos look real and move smoothly.
Generating high-quality 4D content from monocular videos for applications such as digital humans and AR/VR poses challenges in ensuring temporal and spatial consistency, preserving intricate details, and incorporating user guidance effectively. To overcome these challenges, we introduce Splat4D, a novel framework enabling high-fidelity 4D content generation from a monocular video. Splat4D achieves superior performance while maintaining faithful spatial-temporal coherence by leveraging multi-view rendering, inconsistency identification, a video diffusion model, and an asymmetric U-Net for refinement. Through extensive evaluations on public benchmarks, Splat4D consistently demonstrates state-of-the-art performance across various metrics, underscoring the efficacy of our approach. Additionally, the versatility of Splat4D is validated in various applications such as text/image conditioned 4D generation, 4D human generation, and text-guided content editing, producing coherent outcomes following user instructions.
Similar Papers
VDEGaussian: Video Diffusion Enhanced 4D Gaussian Splatting for Dynamic Urban Scenes Modeling
CV and Pattern Recognition
Makes videos of moving things look clearer.
Detail Enhanced Gaussian Splatting for Large-Scale Volumetric Capture
Graphics
Makes movie characters look super real.
Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction
CV and Pattern Recognition
Makes 3D videos more real, even with missing parts.