Low-Cost Test-Time Adaptation for Robust Video Editing
By: Jianhui Wang , Yinda Chen , Yangfan He and more
Potential Business Impact:
Makes videos look better with less effort.
Video editing is a critical component of content creation that transforms raw footage into coherent works aligned with specific visual and narrative objectives. Existing approaches face two major challenges: temporal inconsistencies due to failure in capturing complex motion patterns, and overfitting to simple prompts arising from limitations in UNet backbone architectures. While learning-based methods can enhance editing quality, they typically demand substantial computational resources and are constrained by the scarcity of high-quality annotated data. In this paper, we present Vid-TTA, a lightweight test-time adaptation framework that personalizes optimization for each test video during inference through self-supervised auxiliary tasks. Our approach incorporates a motion-aware frame reconstruction mechanism that identifies and preserves crucial movement regions, alongside a prompt perturbation and reconstruction strategy that strengthens model robustness to diverse textual descriptions. These innovations are orchestrated by a meta-learning driven dynamic loss balancing mechanism that adaptively adjusts the optimization process based on video characteristics. Extensive experiments demonstrate that Vid-TTA significantly improves video temporal consistency and mitigates prompt overfitting while maintaining low computational overhead, offering a plug-and-play performance boost for existing video editing models.
Similar Papers
Test-Time Adaptation for Video Highlight Detection Using Meta-Auxiliary Learning and Cross-Modality Hallucinations
CV and Pattern Recognition
Makes video highlight finders work better on new videos.
Ultra-Light Test-Time Adaptation for Vision--Language Models
CV and Pattern Recognition
Makes AI better at seeing new things.
ETTA: Efficient Test-Time Adaptation for Vision-Language Models through Dynamic Embedding Updates
CV and Pattern Recognition
Makes AI better at understanding new pictures.