Score: 1

MTV-Inpaint: Multi-Task Long Video Inpainting

Published: March 14, 2025 | arXiv ID: 2503.11412v1

By: Shiyuan Yang , Zheng Gu , Liang Hou and more

Potential Business Impact:

Adds or changes things in videos using words.

Business Areas:

Video Editing Content and Publishing, Media and Entertainment, Video

Video inpainting involves modifying local regions within a video, ensuring spatial and temporal consistency. Most existing methods focus primarily on scene completion (i.e., filling missing regions) and lack the capability to insert new objects into a scene in a controllable manner. Fortunately, recent advancements in text-to-video (T2V) diffusion models pave the way for text-guided video inpainting. However, directly adapting T2V models for inpainting remains limited in unifying completion and insertion tasks, lacks input controllability, and struggles with long videos, thereby restricting their applicability and flexibility. To address these challenges, we propose MTV-Inpaint, a unified multi-task video inpainting framework capable of handling both traditional scene completion and novel object insertion tasks. To unify these distinct tasks, we design a dual-branch spatial attention mechanism in the T2V diffusion U-Net, enabling seamless integration of scene completion and object insertion within a single framework. In addition to textual guidance, MTV-Inpaint supports multimodal control by integrating various image inpainting models through our proposed image-to-video (I2V) inpainting mode. Additionally, we propose a two-stage pipeline that combines keyframe inpainting with in-between frame propagation, enabling MTV-Inpaint to effectively handle long videos with hundreds of frames. Extensive experiments demonstrate that MTV-Inpaint achieves state-of-the-art performance in both scene completion and object insertion tasks. Furthermore, it demonstrates versatility in derived applications such as multi-modal inpainting, object editing, removal, image object brush, and the ability to handle long videos. Project page: https://mtv-inpaint.github.io/.

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

CV and Pattern Recognition

Fixes missing parts in videos, even long ones.

7 Mar 2025 1

88%

VAInpaint: Zero-Shot Video-Audio inpainting framework with LLMs-driven Module

Multimedia

Removes sounds and objects from videos perfectly.

21 Sep 2025 0

87%

InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model

CV and Pattern Recognition

Fixes 3D virtual worlds instantly.

12 Jun 2025 1

View PDF Login to Bookmark

Page Count

14 pages

MTV-Inpaint: Multi-Task Long Video Inpainting

Adds or changes things in videos using words.

Technical Abstract

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

VAInpaint: Zero-Shot Video-Audio inpainting framework with LLMs-driven Module

InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model