A Lightweight Dual-Mode Optimization for Generative Face Video Coding
By: Zihan Zhang , Shanzhi Yin , Bolin Chen and more
Potential Business Impact:
Makes video calls look better on phones.
Generative Face Video Coding (GFVC) achieves superior rate-distortion performance by leveraging the strong inference capabilities of deep generative models. However, its practical deployment is hindered by large model parameters and high computational costs. To address this, we propose a lightweight GFVC framework that introduces dual-mode optimization -- combining architectural redesign and operational refinement -- to reduce complexity whilst preserving reconstruction quality. Architecturally, we replace traditional 3 x 3 convolutions with slimmer and more efficient layers, reducing complexity without compromising feature expressiveness. Operationally, we develop a two-stage adaptive channel pruning strategy: (1) soft pruning during training identifies redundant channels via learnable thresholds, and (2) hard pruning permanently eliminates these channels post-training using a derived mask. This dual-phase approach ensures both training stability and inference efficiency. Experimental results demonstrate that the proposed lightweight dual-mode optimization for GFVC can achieve 90.4% parameter reduction and 88.9% computation saving compared to the baseline, whilst achieving superior performance compared to state-of-the-art video coding standard Versatile Video Coding (VVC) in terms of perceptual-level quality metrics. As such, the proposed method is expected to enable efficient GFVC deployment in resource-constrained environments such as mobile edge devices.
Similar Papers
Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding
CV and Pattern Recognition
Makes video calls look better with less data.
New VVC profiles targeting Feature Coding for Machines
CV and Pattern Recognition
Compresses computer "thoughts" for faster AI.
Generative Neural Video Compression via Video Diffusion Prior
CV and Pattern Recognition
Makes videos look clearer and smoother when compressed.