CoD: A Diffusion Foundation Model for Image Compression
By: Zhaoyang Jia , Zihan Zheng , Naifu Xue and more
Potential Business Impact:
Makes pictures smaller with better quality.
Existing diffusion codecs typically build on text-to-image diffusion foundation models like Stable Diffusion. However, text conditioning is suboptimal from a compression perspective, hindering the potential of downstream diffusion codecs, particularly at ultra-low bitrates. To address it, we introduce \textbf{CoD}, the first \textbf{Co}mpression-oriented \textbf{D}iffusion foundation model, trained from scratch to enable end-to-end optimization of both compression and generation. CoD is not a fixed codec but a general foundation model designed for various diffusion-based codecs. It offers several advantages: \textbf{High compression efficiency}, replacing Stable Diffusion with CoD in downstream codecs like DiffC achieves SOTA results, especially at ultra-low bitrates (e.g., 0.0039 bpp); \textbf{Low-cost and reproducible training}, 300$\times$ faster training than Stable Diffusion ($\sim$ 20 vs. $\sim$ 6,250 A100 GPU days) on entirely open image-only datasets; \textbf{Providing new insights}, e.g., We find pixel-space diffusion can achieve VTM-level PSNR with high perceptual quality and can outperform GAN-based codecs using fewer parameters. We hope CoD lays the foundation for future diffusion codec research. Codes will be released.
Similar Papers
CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation
CV and Pattern Recognition
Makes AI learn from less data, faster.
Generative Image Coding with Diffusion Prior
CV and Pattern Recognition
Makes pictures look good even when squeezed small.
PICD: Versatile Perceptual Image Compression with Diffusion Rendering
CV and Pattern Recognition
Makes computer text look clear when pictures shrink.