Score: 0

ZeroQAT: Your Quantization-aware Training but Efficient

Published: August 21, 2025 | arXiv ID: 2509.00031v1

By: Qitao Tan , Xiaoying Song , Jin Lu and more

Potential Business Impact:

Makes smart computer programs run faster and smaller.

Business Areas:

Quantum Computing Science and Engineering

Quantization is an effective technique to reduce the deployment cost of large language models (LLMs), and post-training quantization (PTQ) has been widely studied due to its efficiency. However, existing low-bit PTQ methods suffer from accuracy degradation because their layer-wise optimization introduces cumulative error propagation and misalignment between local reconstruction objectives and downstream performance. While quantization-aware training (QAT) provides a principled solution, its reliance on backpropagation incurs prohibitive data, time, and memory costs, limiting its practicality. To address these challenges, we propose ZeroQAT, a zeroth-order optimization-based QAT framework. ZeroQAT leverages forward-only gradient estimation to eliminate the need for backpropagation, significantly reducing computational and memory overhead while retaining the benefits of end-to-end optimization. Moreover, ZeroQAT jointly learns quantized weights, weight clipping thresholds, and equivalent transformations to mitigate quantization error and handle activation outliers. Experiments demonstrate that ZeroQAT achieves the efficiency of PTQ while retaining the accuracy of QAT, offering a practical solution for high-quality low-bit quantization of LLMs.

DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models

CV and Pattern Recognition

Makes AI smarter and faster using less computer power.

12 Apr 2025 1

92%

PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception Tasks

CV and Pattern Recognition

Makes AI models smarter and faster for self-driving cars.

14 Aug 2025 0

92%

PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception Tasks

CV and Pattern Recognition

Makes AI see better with less computer power.

14 Aug 2025 0

View PDF Login to Bookmark

Page Count

13 pages

ZeroQAT: Your Quantization-aware Training but Efficient

Makes smart computer programs run faster and smaller.

Technical Abstract

DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models

PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception Tasks

PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception Tasks