Score: 0

QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition

Published: March 25, 2025 | arXiv ID: 2503.19353v1

By: Yuxuan Hu , Xiaodong Chen , Cuiping Li and more

Potential Business Impact:

Makes big computer brains work faster, smarter.

Business Areas:

Quantum Computing Science and Engineering

Large Language Models (LLMs) excel in diverse applications but suffer inefficiency due to massive scale. While quantization reduces computational costs, existing methods degrade accuracy in medium-sized LLMs (e.g., Llama-3-8B) due to activation outliers. To address this, we propose QUAD (Quantization with Activation Decomposition), a framework leveraging Singular Value Decomposition (SVD) to suppress activation outliers for effective 4-bit quantization. QUAD estimates activation singular vectors offline using calibration data to construct an orthogonal transformation matrix P, shifting outliers to additional dimensions in full precision while quantizing rest components to 4-bit. Additionally, QUAD enables parameter-efficient fine-tuning via adaptable full-precision outlier weights, narrowing the accuracy gap between quantized and full-precision models. Experiments demonstrate that QUAD achieves 94% ~ 96% accuracy under W4A4 quantization and 98% accuracy with W4A4/A8 and parameter-efficient fine-tuning for Llama-3 and Qwen-2.5 models. Our code is available at \href{https://github.com/hyx1999/Quad}{repository}.

Turning LLM Activations Quantization-Friendly

Machine Learning (CS)

Makes AI smarter and cheaper to run.

11 May 2025 1

89%

Achieving binary weight and activation for LLMs using Post-Training Quantization

Machine Learning (CS)

Makes big AI models much smaller and faster.

7 Apr 2025 2

89%

Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs

Machine Learning (CS)

Makes smart computer brains run faster on phones.

18 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

18 pages

QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition

Makes big computer brains work faster, smarter.

Technical Abstract

Turning LLM Activations Quantization-Friendly

Achieving binary weight and activation for LLMs using Post-Training Quantization

Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs