Scaling Laws for Task-Stratified Knowledge in Post-Training Quantized Large Language Models
By: Chenxi Zhou , Pengfei Cao , Jiang Li and more
Potential Business Impact:
Makes big AI models smaller without losing smarts.
Large language models (LLMs) present significant deployment challenges due to their scale, with post-training quantization (PTQ) emerging as a practical compression solution. However, a comprehensive understanding of how PTQ precisely impacts diverse LLM knowledge capabilities remains elusive, and existing scaling laws for quantized models often overlook crucial PTQ-specific parameters and task-specific sensitivities. This paper addresses these gaps by conducting an extensive empirical investigation to establish task-stratified scaling laws. We disentangle LLM knowledge into memorization and utilization capabilities and develop a unified quantitative framework that incorporates model size, effective bit-width, calibration set size, and group size. Our central finding reveals that knowledge memorization exhibits markedly greater sensitivity to variations in effective bit-width, calibration set size, and model size compared to the more robust knowledge utilization. These findings offer a fine-grained understanding of PTQ's impact and provide guidance for developing knowledge-aware quantization strategies that can better preserve targeted cognitive functions.
Similar Papers
Scaling Laws for Task-Stratified Knowledge in Post-Training Quantized Large Language Models
Computation and Language
Makes big AI models smaller without losing smarts.
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
Computation and Language
Makes big AI models run on small phones.
You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations
Computation and Language
Makes big AI models run faster and smaller.