Fast Multi-view Consistent 3D Editing with Video Priors
By: Liyi Chen , Ruihuang Li , Guowen Zhang and more
Potential Business Impact:
Changes 3D objects with simple text commands.
Text-driven 3D editing enables user-friendly 3D object or scene editing with text instructions. Due to the lack of multi-view consistency priors, existing methods typically resort to employing 2D generation or editing models to process each view individually, followed by iterative 2D-3D-2D updating. However, these methods are not only time-consuming but also prone to over-smoothed results because the different editing signals gathered from different views are averaged during the iterative process. In this paper, we propose generative Video Prior based 3D Editing (ViP3DE) to employ the temporal consistency priors from pre-trained video generation models for multi-view consistent 3D editing in a single forward pass. Our key insight is to condition the video generation model on a single edited view to generate other consistent edited views for 3D updating directly, thereby bypassing the iterative editing paradigm. Since 3D updating requires edited views to be paired with specific camera poses, we propose motion-preserved noise blending for the video model to generate edited views at predefined camera poses. In addition, we introduce geometry-aware denoising to further enhance multi-view consistency by integrating 3D geometric priors into video models. Extensive experiments demonstrate that our proposed ViP3DE can achieve high-quality 3D editing results even within a single forward pass, significantly outperforming existing methods in both editing quality and speed.
Similar Papers
3D-Consistent Multi-View Editing by Diffusion Guidance
CV and Pattern Recognition
Makes 3D pictures look right after editing.
Generative Video Motion Editing with 3D Point Tracks
CV and Pattern Recognition
Edits videos by changing how things move.
View-Consistent Diffusion Representations for 3D-Consistent Video Generation
CV and Pattern Recognition
Makes computer-made videos look more real.