M3DDM+: An improved video outpainting by a modified masking strategy
By: Takuya Murakawa , Takumi Fukuzawa , Ning Ding and more
Potential Business Impact:
Fixes videos by adding missing parts smoothly.
M3DDM provides a computationally efficient framework for video outpainting via latent diffusion modeling. However, it exhibits significant quality degradation -- manifested as spatial blur and temporal inconsistency -- under challenging scenarios characterized by limited camera motion or large outpainting regions, where inter-frame information is limited. We identify the cause as a training-inference mismatch in the masking strategy: M3DDM's training applies random mask directions and widths across frames, whereas inference requires consistent directional outpainting throughout the video. To address this, we propose M3DDM+, which applies uniform mask direction and width across all frames during training, followed by fine-tuning of the pretrained M3DDM model. Experiments demonstrate that M3DDM+ substantially improves visual fidelity and temporal coherence in information-limited scenarios while maintaining computational efficiency. The code is available at https://github.com/tamaki-lab/M3DDM-Plus.
Similar Papers
Beyond Inpainting: Unleash 3D Understanding for Precise Camera-Controlled Video Generation
CV and Pattern Recognition
Changes video camera views without messing up the picture.
GlobalPaint: Spatiotemporal Coherent Video Outpainting with Global Feature Guidance
CV and Pattern Recognition
Makes videos longer by smartly guessing missing parts.
Mask-Conditioned Voxel Diffusion for Joint Geometry and Color Inpainting
CV and Pattern Recognition
Fixes broken 3D objects by filling in missing parts.