Score: 1

HAS-VQ: Hessian-Adaptive Sparse Vector Quantization for High-Fidelity LLM Compression

Published: January 11, 2026 | arXiv ID: 2601.06959v1

By: Vladimer Khasia

Potential Business Impact:

Makes AI models smaller without losing smarts.

Business Areas:

Quantum Computing Science and Engineering

Post-training quantization is essential for deploying Large Language Models (LLMs) on resource- constrained devices. However, standard integer quantization (e.g., INT4) fundamentally degrades per- formance by imposing a uniform grid on the heavy-tailed distribution of weight parameters, particularly in smaller-scale models (e.g., <2B parameters). We introduce HAS-VQ (Hessian-Adaptive Sparse Vec- tor Quantization), a compression framework that strictly decouples high-sensitivity outliers from the bulk weight distribution using second-order sensitivity analysis. HAS-VQ employs a Hessian-Masked Decoupling strategy to isolate sensitive parameters, followed by robust Vector Quantization (VQ) of the remaining dense body. Crucially, we introduce a residual sparse feedback mechanism that corrects quan- tization errors in the most sensitive dimensions, ensuring exact reconstruction of outliers. We evaluate HAS-VQ on SmolLM2-1.7B, demonstrating two distinct regimes of superiority: (1) Pareto Dominance over Integer Baselines: At 4.23 effective bits-per-parameter (BPP), we achieve a perplexity of 14.23, significantly outperforming the standard INT4 baseline (20.03 PPL at 4.71 BPP). (2) High-Fidelity Compression: Relative to the FP16 baseline, HAS-VQ achieves a 2.3x reduction in model size (7.03 BPP) while maintaining statistically indistinguishable perplexity (10.12 vs. 10.04), effectively offering a lossless compression alternative for bandwidth-constrained environments. The code is available at https://github.com/VladimerKhasia/HASVQ

Hierarchical Vector-Quantized Latents for Perceptual Low-Resolution Video Compression

CV and Pattern Recognition

Makes videos smaller for faster streaming.

31 Dec 2025 0

89%

VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation

CV and Pattern Recognition

Makes AI models that see and talk smaller.

5 Aug 2025 1

89%

SQS: Bayesian DNN Compression through Sparse Quantized Sub-distributions

Machine Learning (CS)

Makes AI smaller and faster for phones.

10 Oct 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

5 pages

HAS-VQ: Hessian-Adaptive Sparse Vector Quantization for High-Fidelity LLM Compression

Makes AI models smaller without losing smarts.

Technical Abstract

Hierarchical Vector-Quantized Latents for Perceptual Low-Resolution Video Compression

VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation

SQS: Bayesian DNN Compression through Sparse Quantized Sub-distributions