PipeFlow: Pipelined Processing and Motion-Aware Frame Selection for Long-Form Video Editing
By: Mustafa Munir , Md Mostafijur Rahman , Kartikeya Bhardwaj and more
Long-form video editing poses unique challenges due to the exponential increase in the computational cost from joint editing and Denoising Diffusion Implicit Models (DDIM) inversion across extended sequences. To address these limitations, we propose PipeFlow, a scalable, pipelined video editing method that introduces three key innovations: First, based on a motion analysis using Structural Similarity Index Measure (SSIM) and Optical Flow, we identify and propose to skip editing of frames with low motion. Second, we propose a pipelined task scheduling algorithm that splits a video into multiple segments and performs DDIM inversion and joint editing in parallel based on available GPU memory. Lastly, we leverage a neural network-based interpolation technique to smooth out the border frames between segments and interpolate the previously skipped frames. Our method uniquely scales to longer videos by dividing them into smaller segments, allowing PipeFlow's editing time to increase linearly with video length. In principle, this enables editing of infinitely long videos without the growing per-frame computational overhead encountered by other methods. PipeFlow achieves up to a 9.6X speedup compared to TokenFlow and a 31.7X speedup over Diffusion Motion Transfer (DMT).
Similar Papers
Consistent Video Editing as Flow-Driven Image-to-Video Generation
CV and Pattern Recognition
Makes videos change smoothly, even with moving parts.
MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation
CV and Pattern Recognition
Makes computers create much longer, better videos.
AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection
CV and Pattern Recognition
Edits long videos much faster and better.