DiffVC-OSD: One-Step Diffusion-based Perceptual Neural Video Compression Framework
By: Wenzhuo Ma, Zhenzhong Chen
Potential Business Impact:
Makes videos look better, faster, and smaller.
In this work, we first propose DiffVC-OSD, a One-Step Diffusion-based Perceptual Neural Video Compression framework. Unlike conventional multi-step diffusion-based methods, DiffVC-OSD feeds the reconstructed latent representation directly into a One-Step Diffusion Model, enhancing perceptual quality through a single diffusion step guided by both temporal context and the latent itself. To better leverage temporal dependencies, we design a Temporal Context Adapter that encodes conditional inputs into multi-level features, offering more fine-grained guidance for the Denoising Unet. Additionally, we employ an End-to-End Finetuning strategy to improve overall compression performance. Extensive experiments demonstrate that DiffVC-OSD achieves state-of-the-art perceptual compression performance, offers about 20$\times$ faster decoding and a 86.92\% bitrate reduction compared to the corresponding multi-step diffusion-based variant.
Similar Papers
OS-DiffVSR: Towards One-step Latent Diffusion Model for High-detailed Real-world Video Super-Resolution
CV and Pattern Recognition
Makes blurry videos clear, fast.
Generative Neural Video Compression via Video Diffusion Prior
CV and Pattern Recognition
Makes videos look clearer and smoother when compressed.
Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression
CV and Pattern Recognition
Makes pictures load super fast, looking great.