Score: 0

Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality Perspective

Published: August 22, 2025 | arXiv ID: 2508.16712v1

By: Tianyao Shi, Yi Ding

Potential Business Impact:

Makes AI models run faster and use less power.

Business Areas:

Quantum Computing Science and Engineering

Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their heavy resource demands make quantization-reducing precision to lower-bit formats-critical for efficient serving. While many quantization methods exist, a systematic understanding of their performance, energy, and quality tradeoffs in realistic serving conditions remains a gap. In this work, we first develop a fully automated online characterization framework qMeter, and then conduct an in-depth characterization of 11 post-training LLM quantization methods across 4 model sizes (7B-70B) and two GPU architectures (A100, H100). We evaluate quantization at the application, workload, parallelism, and hardware levels under online serving conditions. Our study reveals highly task- and method-dependent tradeoffs, strong sensitivity to workload characteristics, and complex interactions with parallelism and GPU architecture. We further present three optimization case studies illustrating deployment challenges in capacity planning, energy-efficient scheduling, and multi-objective tuning. To the best of our knowledge, this is one of the first comprehensive application-, system-, and hardware-level characterization of LLM quantization from a joint performance, energy, and quality perspective.

SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment

Machine Learning (CS)

Makes small AI models work on phones.

17 Nov 2025 1

91%

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Computation and Language

Makes big AI models run on small phones.

20 Aug 2025 1

90%

Resource-Efficient Language Models: Quantization for Fast and Accessible Inference

Artificial Intelligence

Makes big computer brains use less power.

13 May 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

14 pages

Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality Perspective

Makes AI models run faster and use less power.

Technical Abstract

SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Resource-Efficient Language Models: Quantization for Fast and Accessible Inference