Score: 0

Integrating Pruning with Quantization for Efficient Deep Neural Networks Compression

Published: September 4, 2025 | arXiv ID: 2509.04244v1

By: Sara Makenali, Babak Rokh, Ali Azarpeyvand

Potential Business Impact:

Makes smart computer programs smaller and faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Deep Neural Networks (DNNs) have achieved significant advances in a wide range of applications. However, their deployment on resource-constrained devices remains a challenge due to the large number of layers and parameters, which result in considerable computational and memory demands. To address this issue, pruning and quantization are two widely used compression techniques, commonly applied individually in most studies to reduce model size and enhance processing speed. Nevertheless, combining these two techniques can yield even greater compression benefits. Effectively integrating pruning and quantization to harness their complementary advantages poses a challenging task, primarily due to their potential impact on model accuracy and the complexity of jointly optimizing both processes. In this paper, we propose two approaches that integrate similarity-based filter pruning with Adaptive Power-of-Two (APoT) quantization to achieve higher compression efficiency while preserving model accuracy. In the first approach, pruning and quantization are applied simultaneously during training. In the second approach, pruning is performed first to remove less important parameters, followed by quantization of the pruned model using low-bit representations. Experimental results demonstrate that our proposed approaches achieve effective model compression with minimal accuracy degradation, making them well-suited for deployment on devices with limited computational resources.

Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression

Machine Learning (CS)

Makes computer brains smaller and faster.

23 Feb 2025 1

89%

Compressing CNN models for resource-constrained systems by channel and layer pruning

Machine Learning (CS)

Makes smart computer programs smaller and faster.

10 Sep 2025 0

89%

CoDeQ: End-to-End Joint Model Compression with Dead-Zone Quantizer for High-Sparsity and Low-Precision Networks

Machine Learning (CS)

Shrinks computer programs without losing quality.

15 Dec 2025 1

View PDF Login to Bookmark

Country of Origin

🇮🇷 Iran

Page Count

12 pages

Integrating Pruning with Quantization for Efficient Deep Neural Networks Compression

Makes smart computer programs smaller and faster.

Technical Abstract

Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression

Compressing CNN models for resource-constrained systems by channel and layer pruning

CoDeQ: End-to-End Joint Model Compression with Dead-Zone Quantizer for High-Sparsity and Low-Precision Networks