Flowception: Temporally Expansive Flow Matching for Video Generation
By: Tariq Berrada Ifriqi , John Nguyen , Karteek Alahari and more
Potential Business Impact:
Makes videos by adding and fixing pictures.
We present Flowception, a novel non-autoregressive and variable-length video generation framework. Flowception learns a probability path that interleaves discrete frame insertions with continuous frame denoising. Compared to autoregressive methods, Flowception alleviates error accumulation/drift as the frame insertion mechanism during sampling serves as an efficient compression mechanism to handle long-term context. Compared to full-sequence flows, our method reduces FLOPs for training three-fold, while also being more amenable to local attention variants, and allowing to learn the length of videos jointly with their content. Quantitative experimental results show improved FVD and VBench metrics over autoregressive and full-sequence baselines, which is further validated with qualitative results. Finally, by learning to insert and denoise frames in a sequence, Flowception seamlessly integrates different tasks such as image-to-video generation and video interpolation.
Similar Papers
CTFlow: Video-Inspired Latent Flow Matching for 3D CT Synthesis
CV and Pattern Recognition
Creates fake CT scans from doctor's notes.
Edit Flows: Flow Matching with Edit Operations
Machine Learning (CS)
Lets computers write better stories and code.
Flow and Depth Assisted Video Prediction with Latent Transformer
CV and Pattern Recognition
Helps computers guess what's hidden in videos.