Score: 0

LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization

Published: December 2, 2025 | arXiv ID: 2512.02933v2

By: Zhihan Xiao , Lin Liu , Yixin Gao and more

Potential Business Impact:

Edits videos by adding or removing things without masks.

Business Areas:

Image Recognition Data and Analytics, Software

Text-guided video editing, particularly for object removal and addition, remains a challenging task due to the need for precise spatial and temporal consistency. Existing methods often rely on auxiliary masks or reference images for editing guidance, which limits their scalability and generalization. To address these issues, we propose LoVoRA, a novel framework for mask-free video object removal and addition using object-aware localization mechanism. Our approach utilizes a unique dataset construction pipeline that integrates image-to-video translation, optical flow-based mask propagation, and video inpainting, enabling temporally consistent edits. The core innovation of LoVoRA is its learnable object-aware localization mechanism, which provides dense spatio-temporal supervision for both object insertion and removal tasks. By leveraging a Diffusion Mask Predictor, LoVoRA achieves end-to-end video editing without requiring external control signals during inference. Extensive experiments and human evaluation demonstrate the effectiveness and high-quality performance of LoVoRA. https://cz-5f.github.io/LoVoRA.github.io

LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization

CV and Pattern Recognition

Edits videos by adding or removing objects without masks.

2 Dec 2025 0

90%

LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

CV and Pattern Recognition

Lets you change videos by drawing on them.

11 Jun 2025 0

89%

In-Context Sync-LoRA for Portrait Video Editing

CV and Pattern Recognition

Edits videos while keeping movements perfectly matched.

2 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

18 pages

LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization

Edits videos by adding or removing things without masks.

Technical Abstract

LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization

LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

In-Context Sync-LoRA for Portrait Video Editing