Score: 0

Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization

Published: April 13, 2025 | arXiv ID: 2504.09629v2

By: Yamato Arai, Yuma Ichikawa

Potential Business Impact:

Makes AI models smaller and faster.

Business Areas:

Quantum Computing Science and Engineering

Layer-wise PTQ is a promising technique for compressing large language models (LLMs), due to its simplicity and effectiveness without requiring retraining. However, recent progress in this area is saturating, underscoring the need to revisit its core limitations and explore further improvements. We address this challenge by identifying a key limitation of existing layer-wise PTQ methods: the growth of quantization errors across layers significantly degrades performance, particularly in low-bit regimes. To address this fundamental issue, we propose Quantization Error Propagation (QEP), a general, lightweight, and scalable framework that enhances layer-wise PTQ by explicitly propagating quantization errors and compensating for accumulated errors. QEP also offers a tunable propagation mechanism that prevents overfitting and controls computational overhead, enabling the framework to adapt to various architectures and resource budgets. Extensive experiments on several LLMs demonstrate that QEP-enhanced layer-wise PTQ achieves substantially higher accuracy than existing methods. Notably, the gains are most pronounced in the extremely low-bit quantization regime.

ZeroQAT: Your Quantization-aware Training but Efficient

Machine Learning (CS)

Makes smart computer programs run faster and smaller.

21 Aug 2025 0

89%

Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction

CV and Pattern Recognition

Makes computer models smaller without losing accuracy.

1 May 2025 1

89%

Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models

Machine Learning (CS)

Makes smart computer programs smaller and faster.

5 Aug 2025 1

View PDF Login to Bookmark

Page Count

28 pages

Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization

Makes AI models smaller and faster.

Technical Abstract

ZeroQAT: Your Quantization-aware Training but Efficient

Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction

Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models