DeCo-VAE: Learning Compact Latents for Video Reconstruction via Decoupled Representation
By: Xiangchen Yin , Jiahui Yuan , Zhangchi Hu and more
Potential Business Impact:
Makes videos smaller by separating key parts.
Existing video Variational Autoencoders (VAEs) generally overlook the similarity between frame contents, leading to redundant latent modeling. In this paper, we propose decoupled VAE (DeCo-VAE) to achieve compact latent representation. Instead of encoding RGB pixels directly, we decompose video content into distinct components via explicit decoupling: keyframe, motion and residual, and learn dedicated latent representation for each. To avoid cross-component interference, we design dedicated encoders for each decoupled component and adopt a shared 3D decoder to maintain spatiotemporal consistency during reconstruction. We further utilize a decoupled adaptation strategy that freezes partial encoders while training the others sequentially, ensuring stable training and accurate learning of both static and dynamic features. Extensive quantitative and qualitative experiments demonstrate that DeCo-VAE achieves superior video reconstruction performance.
Similar Papers
Hi-VAE: Efficient Video Autoencoding with Global and Detailed Motion
CV and Pattern Recognition
Makes videos smaller without losing quality.
VideoCompressa: Data-Efficient Video Understanding via Joint Temporal Compression and Spatial Reconstruction
CV and Pattern Recognition
Makes AI learn from videos using way less data.
Physically Interpretable Representation Learning with Gaussian Mixture Variational AutoEncoder (GM-VAE)
Machine Learning (CS)
Finds hidden patterns in messy science data.