DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning
By: Dongxu Liu , Yuang Peng , Haomiao Tang and more
Potential Business Impact:
Makes pictures smaller, clearer, and faster to make.
Autoencoders empower state-of-the-art image and video generative models by compressing pixels into a latent space through visual tokenization. Although recent advances have alleviated the performance degradation of autoencoders under high compression ratios, addressing the training instability caused by GAN remains an open challenge. While improving spatial compression, we also aim to minimize the latent space dimensionality, enabling more efficient and compact representations. To tackle these challenges, we focus on improving the decoder's expressiveness. Concretely, we propose DGAE, which employs a diffusion model to guide the decoder in recovering informative signals that are not fully decoded from the latent representation. With this design, DGAE effectively mitigates the performance degradation under high spatial compression rates. At the same time, DGAE achieves state-of-the-art performance with a 2x smaller latent space. When integrated with Diffusion Models, DGAE demonstrates competitive performance on image generation for ImageNet-1K and shows that this compact latent representation facilitates faster convergence of the diffusion model.
Similar Papers
Generative Latent Diffusion for Efficient Spatiotemporal Data Reduction
Machine Learning (CS)
Saves space by smartly guessing missing video parts.
H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models
CV and Pattern Recognition
Makes phone videos create super fast and good.
Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging
CV and Pattern Recognition
Helps doctors spot Alzheimer's from brain scans.