MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction
By: Yingshuang Zou , Yikang Ding , Chuanrui Zhang and more
Potential Business Impact:
Makes self-driving cars see better from any angle.
Recent breakthroughs in radiance fields have significantly advanced 3D scene reconstruction and novel view synthesis (NVS) in autonomous driving. Nevertheless, critical limitations persist: reconstruction-based methods exhibit substantial performance deterioration under significant viewpoint deviations from training trajectories, while generation-based techniques struggle with temporal coherence and precise scene controllability. To overcome these challenges, we present MuDG, an innovative framework that integrates Multi-modal Diffusion model with Gaussian Splatting (GS) for Urban Scene Reconstruction. MuDG leverages aggregated LiDAR point clouds with RGB and geometric priors to condition a multi-modal video diffusion model, synthesizing photorealistic RGB, depth, and semantic outputs for novel viewpoints. This synthesis pipeline enables feed-forward NVS without computationally intensive per-scene optimization, providing comprehensive supervision signals to refine 3DGS representations for rendering robustness enhancement under extreme viewpoint changes. Experiments on the Open Waymo Dataset demonstrate that MuDG outperforms existing methods in both reconstruction and synthesis quality.
Similar Papers
Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis
CV and Pattern Recognition
Creates realistic 3D worlds from few pictures.
VDEGaussian: Video Diffusion Enhanced 4D Gaussian Splatting for Dynamic Urban Scenes Modeling
CV and Pattern Recognition
Makes videos of moving things look clearer.
ADGaussian: Generalizable Gaussian Splatting for Autonomous Driving with Multi-modal Inputs
CV and Pattern Recognition
Makes one picture look like a 3D scene.