Score: 1

InfoQ: Mixed-Precision Quantization via Global Information Flow

Published: August 6, 2025 | arXiv ID: 2508.04753v1

By: Mehmet Emre Akbulut , Hazem Hesham Yousef Shalby , Fabrizio Pittorino and more

Potential Business Impact:

Makes AI smarter on small devices.

Mixed-precision quantization (MPQ) is crucial for deploying deep neural networks on resource-constrained devices, but finding the optimal bit-width for each layer represents a complex combinatorial optimization problem. Current state-of-the-art methods rely on computationally expensive search algorithms or local sensitivity heuristic proxies like the Hessian, which fail to capture the cascading global effects of quantization error. In this work, we argue that the quantization sensitivity of a layer should not be measured by its local properties, but by its impact on the information flow throughout the entire network. We introduce InfoQ, a novel framework for MPQ that is training-free in the bit-width search phase. InfoQ assesses layer sensitivity by quantizing each layer at different bit-widths and measuring, through a single forward pass, the resulting change in mutual information in the subsequent layers. This quantifies how much each layer quantization impacts the network information flow. The resulting scores are used to formulate bit-width allocation as an integer linear programming problem, which is solved efficiently to minimize total sensitivity under a given budget (e.g., model size or BitOps). Our retraining-free search phase provides a superior search-time/accuracy trade-off (using two orders of magnitude less data compared to state-of-the-art methods such as LIMPQ), while yielding up to a 1% accuracy improvement for MobileNetV2 and ResNet18 on ImageNet at high compression rates (14X and 10.66X).

Where and How to Enhance: Discovering Bit-Width Contribution for Mixed Precision Quantization

Machine Learning (CS)

Makes AI smarter by using less computer power.

5 Aug 2025 1

91%

Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs

Machine Learning (CS)

Makes AI models run faster and smaller.

19 May 2025 1

91%

Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning

CV and Pattern Recognition

Makes AI smarter with less computer power.

8 May 2025 0

View PDF Login to Bookmark

Country of Origin

🇮🇹 Italy

Page Count

14 pages

InfoQ: Mixed-Precision Quantization via Global Information Flow

Makes AI smarter on small devices.

Technical Abstract

Where and How to Enhance: Discovering Bit-Width Contribution for Mixed Precision Quantization

Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs

Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning