Score: 1

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers

Published: April 3, 2025 | arXiv ID: 2504.02508v1

By: Zhuguanyu Wu , Jiayi Zhang , Jiaxin Chen and more

Potential Business Impact:

Makes AI see better with less computer power.

Business Areas:

Image Recognition Data and Analytics, Software

Vision Transformers (ViTs) have become one of the most commonly used backbones for vision tasks. Despite their remarkable performance, they often suffer significant accuracy drops when quantized for practical deployment, particularly by post-training quantization (PTQ) under ultra-low bits. Recently, reconstruction-based PTQ methods have shown promising performance in quantizing Convolutional Neural Networks (CNNs). However, they fail when applied to ViTs, primarily due to the inaccurate estimation of output importance and the substantial accuracy degradation in quantizing post-GELU activations. To address these issues, we propose \textbf{APHQ-ViT}, a novel PTQ approach based on importance estimation with Average Perturbation Hessian (APH). Specifically, we first thoroughly analyze the current approximation approaches with Hessian loss, and propose an improved average perturbation Hessian loss. To deal with the quantization of the post-GELU activations, we design an MLP Reconstruction (MR) method by replacing the GELU function in MLP with ReLU and reconstructing it by the APH loss on a small unlabeled calibration set. Extensive experiments demonstrate that APHQ-ViT using linear quantizers outperforms existing PTQ methods by substantial margins in 3-bit and 4-bit across different vision tasks. The source code is available at https://github.com/GoatWu/APHQ-ViT.

IPTQ-ViT: Post-Training Quantization of Non-linear Functions for Integer-only Vision Transformers

CV and Pattern Recognition

Makes computer vision faster without losing quality.

19 Nov 2025 1

90%

FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation

CV and Pattern Recognition

Makes AI image programs smaller, faster, and more accurate.

13 Jun 2025 2

88%

VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation

CV and Pattern Recognition

Makes AI models that see and talk smaller.

5 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

12 pages

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers

Makes AI see better with less computer power.

Technical Abstract

IPTQ-ViT: Post-Training Quantization of Non-linear Functions for Integer-only Vision Transformers

FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation

VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation