LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning
By: Chenjian Gao , Lihe Ding , Xin Cai and more
Potential Business Impact:
Lets you change videos by drawing on them.
Video editing using diffusion models has achieved remarkable results in generating high-quality edits for videos. However, current methods often rely on large-scale pretraining, limiting flexibility for specific edits. First-frame-guided editing provides control over the first frame, but lacks flexibility over subsequent frames. To address this, we propose a mask-based LoRA (Low-Rank Adaptation) tuning method that adapts pretrained Image-to-Video (I2V) models for flexible video editing. Our key innovation is using a spatiotemporal mask to strategically guide the LoRA fine-tuning process. This teaches the model two distinct skills: first, to interpret the mask as a command to either preserve content from the source video or generate new content in designated regions. Second, for these generated regions, LoRA learns to synthesize either temporally consistent motion inherited from the video or novel appearances guided by user-provided reference frames. This dual-capability LoRA grants users control over the edit's entire temporal evolution, allowing complex transformations like an object rotating or a flower blooming. Experimental results show our method achieves superior video editing performance compared to baseline methods. Project Page: https://cjeen.github.io/LoRAEdit
Similar Papers
In-Context Sync-LoRA for Portrait Video Editing
CV and Pattern Recognition
Edits videos while keeping movements perfectly matched.
LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization
CV and Pattern Recognition
Edits videos by adding or removing things without masks.
LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization
CV and Pattern Recognition
Edits videos by adding or removing objects without masks.