Score: 1

WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

Published: November 30, 2025 | arXiv ID: 2512.00956v1

By: Jiale Chen , Vage Egiazarian , Torsten Hoefler and more

Potential Business Impact:

Makes AI models work better with less data.

Business Areas:

Quantum Computing Science and Engineering

Quantization to low bitwidth is a standard approach for deploying large language models, however, a few extreme weights and activations stretch the dynamic range and reduce the effective resolution of the quantizer. A common mitigation approach is to apply some fixed orthogonal transforms, such as Hadamard matrices, before quantization, which typically reduces the dynamic range. Yet, these transforms ignore the statistics of the data, and their optimality is currently not understood. In this work, we derive, for the first time, closed-form optimal linear blockwise transforms for joint weight-activation quantization using standard data-free quantizers for common numerical formats. Specifically, we provide derivations of the optimal adaptive (data-aware) transforms for round-to-nearest (RTN), AbsMax-scaled block quantizers for both integer and floating-point formats. The resulting construction, which we call WUSH, combines a Hadamard backbone with a data-dependent component based on second-order moments, yielding a non-orthogonal transform that is provably optimal under mild assumptions and remains structured for efficient implementation. Preliminary experimental results show that our approach consistently improves upon the Hadamard transform for common formats.

QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

Computation and Language

Makes AI smarter and faster using less computer power.

22 Sep 2025 2

88%

ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms

Machine Learning (CS)

Makes big AI models fit on phones.

11 Sep 2025 0

88%

Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs

Machine Learning (CS)

Makes smart computer brains run faster on phones.

18 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇦🇹 🇨🇭 Switzerland, Austria

Page Count

21 pages

WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

Makes AI models work better with less data.

Technical Abstract

QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms

Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs