Score: 0

DeCo-VAE: Learning Compact Latents for Video Reconstruction via Decoupled Representation

Published: November 18, 2025 | arXiv ID: 2511.14530v1

By: Xiangchen Yin , Jiahui Yuan , Zhangchi Hu and more

Potential Business Impact:

Makes videos smaller by separating key parts.

Business Areas:

Motion Capture Media and Entertainment, Video

Existing video Variational Autoencoders (VAEs) generally overlook the similarity between frame contents, leading to redundant latent modeling. In this paper, we propose decoupled VAE (DeCo-VAE) to achieve compact latent representation. Instead of encoding RGB pixels directly, we decompose video content into distinct components via explicit decoupling: keyframe, motion and residual, and learn dedicated latent representation for each. To avoid cross-component interference, we design dedicated encoders for each decoupled component and adopt a shared 3D decoder to maintain spatiotemporal consistency during reconstruction. We further utilize a decoupled adaptation strategy that freezes partial encoders while training the others sequentially, ensuring stable training and accurate learning of both static and dynamic features. Extensive quantitative and qualitative experiments demonstrate that DeCo-VAE achieves superior video reconstruction performance.

Autoregressive Video Autoencoder with Decoupled Temporal and Spatial Context

CV and Pattern Recognition

Makes videos smaller without losing quality.

12 Dec 2025 0

91%

Variational decomposition autoencoding improves disentanglement of latent representations

Machine Learning (CS)

**Finds hidden patterns in sounds and body signals.**

11 Jan 2026 1

90%

Hierarchical Vector-Quantized Latents for Perceptual Low-Resolution Video Compression

CV and Pattern Recognition

Makes videos smaller for faster streaming.

31 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

11 pages

DeCo-VAE: Learning Compact Latents for Video Reconstruction via Decoupled Representation

Makes videos smaller by separating key parts.

Technical Abstract

Autoregressive Video Autoencoder with Decoupled Temporal and Spatial Context

Variational decomposition autoencoding improves disentanglement of latent representations

Hierarchical Vector-Quantized Latents for Perceptual Low-Resolution Video Compression