Score: 3

3DM-WeConvene: Learned Image Compression with 3D Multi-Level Wavelet-Domain Convolution and Entropy Model

Published: April 7, 2025 | arXiv ID: 2504.04658v1

By: Haisheng Fu , Jie Liang , Feng Liang and more

BigTech Affiliations: Google

Potential Business Impact:

Makes pictures smaller with less detail lost.

Business Areas:

Image Recognition Data and Analytics, Software

Learned image compression (LIC) has recently made significant progress, surpassing traditional methods. However, most LIC approaches operate mainly in the spatial domain and lack mechanisms for reducing frequency-domain correlations. To address this, we propose a novel framework that integrates low-complexity 3D multi-level Discrete Wavelet Transform (DWT) into convolutional layers and entropy coding, reducing both spatial and channel correlations to improve frequency selectivity and rate-distortion (R-D) performance. Our proposed 3D multi-level wavelet-domain convolution (3DM-WeConv) layer first applies 3D multi-level DWT (e.g., 5/3 and 9/7 wavelets from JPEG 2000) to transform data into the wavelet domain. Then, different-sized convolutions are applied to different frequency subbands, followed by inverse 3D DWT to restore the spatial domain. The 3DM-WeConv layer can be flexibly used within existing CNN-based LIC models. We also introduce a 3D wavelet-domain channel-wise autoregressive entropy model (3DWeChARM), which performs slice-based entropy coding in the 3D DWT domain. Low-frequency (LF) slices are encoded first to provide priors for high-frequency (HF) slices. A two-step training strategy is adopted: first balancing LF and HF rates, then fine-tuning with separate weights. Extensive experiments demonstrate that our framework consistently outperforms state-of-the-art CNN-based LIC methods in R-D performance and computational complexity, with larger gains for high-resolution images. On the Kodak, Tecnick 100, and CLIC test sets, our method achieves BD-Rate reductions of -12.24%, -15.51%, and -12.97%, respectively, compared to H.266/VVC.

3D Wavelet Convolutions with Extended Receptive Fields for Hyperspectral Image Classification

CV and Pattern Recognition

Helps computers see details in special pictures.

15 Apr 2025 0

88%

Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual Representation

CV and Pattern Recognition

Teaches computers to see faster using sound waves.

2 Mar 2025 0

87%

LoC-LIC: Low Complexity Learned Image Coding Using Hierarchical Feature Transforms

Image and Video Processing

Makes pictures smaller using less computer power.

30 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇺🇸 🇨🇦 Canada, China, United States

Page Count

13 pages

3DM-WeConvene: Learned Image Compression with 3D Multi-Level Wavelet-Domain Convolution and Entropy Model

Makes pictures smaller with less detail lost.

Technical Abstract

3D Wavelet Convolutions with Extended Receptive Fields for Hyperspectral Image Classification

Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual Representation

LoC-LIC: Low Complexity Learned Image Coding Using Hierarchical Feature Transforms