T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates
By: Zhitao Wang , Hengyu Man , Wenrui Li and more
Potential Business Impact:
Makes videos look good with less data.
Recent advances in video generation techniques have given rise to an emerging paradigm of generative video coding for Ultra-Low Bitrate (ULB) scenarios by leveraging powerful generative priors. However, most existing methods are limited by domain specificity (e.g., facial or human videos) or excessive dependence on high-level text guidance, which tend to inadequately capture fine-grained motion details, leading to unrealistic or incoherent reconstructions. To address these challenges, we propose Trajectory-Guided Generative Video Coding (dubbed T-GVC), a novel framework that bridges low-level motion tracking with high-level semantic understanding. T-GVC features a semantic-aware sparse motion sampling pipeline that extracts pixel-wise motion as sparse trajectory points based on their semantic importance, significantly reducing the bitrate while preserving critical temporal semantic information. In addition, by integrating trajectory-aligned loss constraints into diffusion processes, we introduce a training-free guidance mechanism in latent space to ensure physically plausible motion patterns without sacrificing the inherent capabilities of generative models. Experimental results demonstrate that T-GVC outperforms both traditional and neural video codecs under ULB conditions. Furthermore, additional experiments confirm that our framework achieves more precise motion control than existing text-guided methods, paving the way for a novel direction of generative video coding guided by geometric motion modeling.
Similar Papers
T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low Bitrates
CV and Pattern Recognition
Makes videos look good with less data.
Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis
CV and Pattern Recognition
Sends clear pictures using tiny amounts of data.
Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding
CV and Pattern Recognition
Makes video calls look better with less data.