DisCo3D: Distilling Multi-View Consistency for 3D Scene Editing
By: Yufeng Chi , Huimin Ma , Kafeng Wang and more
Potential Business Impact:
Changes 3D objects in pictures perfectly.
While diffusion models have demonstrated remarkable progress in 2D image generation and editing, extending these capabilities to 3D editing remains challenging, particularly in maintaining multi-view consistency. Classical approaches typically update 3D representations through iterative refinement based on a single editing view. However, these methods often suffer from slow convergence and blurry artifacts caused by cross-view inconsistencies. Recent methods improve efficiency by propagating 2D editing attention features, yet still exhibit fine-grained inconsistencies and failure modes in complex scenes due to insufficient constraints. To address this, we propose \textbf{DisCo3D}, a novel framework that distills 3D consistency priors into a 2D editor. Our method first fine-tunes a 3D generator using multi-view inputs for scene adaptation, then trains a 2D editor through consistency distillation. The edited multi-view outputs are finally optimized into 3D representations via Gaussian Splatting. Experimental results show DisCo3D achieves stable multi-view consistency and outperforms state-of-the-art methods in editing quality.
Similar Papers
3D-Consistent Multi-View Editing by Diffusion Guidance
CV and Pattern Recognition
Makes 3D pictures look right after editing.
View-Consistent Diffusion Representations for 3D-Consistent Video Generation
CV and Pattern Recognition
Makes computer-made videos look more real.
Coupled Diffusion Sampling for Training-Free Multi-View Image Editing
CV and Pattern Recognition
Edits pictures from many angles, all matching.