InstructVEdit: A Holistic Approach for Instructional Video Editing
By: Chi Zhang , Chengjian Feng , Feng Yan and more
Potential Business Impact:
Lets you edit videos by just telling it what to do.
Video editing according to instructions is a highly challenging task due to the difficulty in collecting large-scale, high-quality edited video pair data. This scarcity not only limits the availability of training data but also hinders the systematic exploration of model architectures and training strategies. While prior work has improved specific aspects of video editing (e.g., synthesizing a video dataset using image editing techniques or decomposed video editing training), a holistic framework addressing the above challenges remains underexplored. In this study, we introduce InstructVEdit, a full-cycle instructional video editing approach that: (1) establishes a reliable dataset curation workflow to initialize training, (2) incorporates two model architectural improvements to enhance edit quality while preserving temporal consistency, and (3) proposes an iterative refinement strategy leveraging real-world data to enhance generalization and minimize train-test discrepancies. Extensive experiments show that InstructVEdit achieves state-of-the-art performance in instruction-based video editing, demonstrating robust adaptability to diverse real-world scenarios. Project page: https://o937-blip.github.io/InstructVEdit.
Similar Papers
InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction
CV and Pattern Recognition
Makes videos change by just telling it what to do.
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
CV and Pattern Recognition
Makes editing videos easier with text commands.
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
CV and Pattern Recognition
Teaches computers to edit pictures better with clearer instructions.