InstantViR: Real-Time Video Inverse Problem Solver with Distilled Diffusion Prior
By: Weimin Bai , Suzhe Xu , Yiwei Ren and more
Potential Business Impact:
Restores blurry videos instantly for streaming.
Video inverse problems are fundamental to streaming, telepresence, and AR/VR, where high perceptual quality must coexist with tight latency constraints. Diffusion-based priors currently deliver state-of-the-art reconstructions, but existing approaches either adapt image diffusion models with ad hoc temporal regularizers - leading to temporal artifacts - or rely on native video diffusion models whose iterative posterior sampling is far too slow for real-time use. We introduce InstantViR, an amortized inference framework for ultra-fast video reconstruction powered by a pre-trained video diffusion prior. We distill a powerful bidirectional video diffusion model (teacher) into a causal autoregressive student that maps a degraded video directly to its restored version in a single forward pass, inheriting the teacher's strong temporal modeling while completely removing iterative test-time optimization. The distillation is prior-driven: it only requires the teacher diffusion model and known degradation operators, and does not rely on externally paired clean/noisy video data. To further boost throughput, we replace the video-diffusion backbone VAE with a high-efficiency LeanVAE via an innovative teacher-space regularized distillation scheme, enabling low-latency latent-space processing. Across streaming random inpainting, Gaussian deblurring and super-resolution, InstantViR matches or surpasses the reconstruction quality of diffusion-based baselines while running at over 35 FPS on NVIDIA A100 GPUs, achieving up to 100 times speedups over iterative video diffusion solvers. These results show that diffusion-based video reconstruction is compatible with real-time, interactive, editable, streaming scenarios, turning high-quality video restoration into a practical component of modern vision systems.
Similar Papers
InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse Problem
CV and Pattern Recognition
Makes videos change and edit easily.
Generative Neural Video Compression via Video Diffusion Prior
CV and Pattern Recognition
Makes videos look clearer and smoother when compressed.
DiTVR: Zero-Shot Diffusion Transformer for Video Restoration
CV and Pattern Recognition
Fixes blurry videos by tracking movement.