Score: 1

Neural Weight Compression for Language Models

Published: October 13, 2025 | arXiv ID: 2510.11234v1

By: Jegwang Ryu , Minkyu Kim , Seungjun Shin and more

Potential Business Impact:

Makes AI models smaller and faster to use.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The efficient storage and transmission of language model weights is becoming increasingly important, as their scale and adoption continue to grow. However, as our understanding of this new data modality is limited, designing a good compression algorithm for language model weights heavily relies on manual, trial-and-error approaches. In this paper, we propose a learned compression framework that trains neural codecs directly from pretrained language model weights. Unlike conventional data (e.g., images), language model weights pose unique challenges: the sizes and shapes of weight tensors vary significantly, and the reconstruction quality must be judged by downstream model predictions rather than na\"ive MSE loss. To address this, we introduce Neural Weight Compression (NWC), a novel autoencoder-based neural codec tailored to model weight compression. The proposed method inherits the advantages of autoencoder-based codecs while incorporating three technical components: (1) column-wise tensor chunking and normalization; (2) an importance-aware training loss; (3) an inference-time error compensation mechanism guided by model outputs. Experiments on open-weight language models show that NWC achieves competitive or state-of-the-art accuracy-compression tradeoffs, with particularly strong results at 4-6 bit precisions where accuracy remains nearly on par with FP16 models.

Test-Time Steering for Lossless Text Compression via Weighted Product of Experts

Computation and Language

Makes computer files smaller without losing any info.

4 Nov 2025 1

89%

Coding for Computation: Efficient Compression of Neural Networks for Reconfigurable Hardware

Machine Learning (CS)

Makes smart computer programs run much faster.

24 Apr 2025 1

88%

Lossless Compression of Neural Network Components: Weights, Checkpoints, and K/V Caches in Low-Precision Formats

Machine Learning (CS)

Shrinks AI models to save space and speed.

20 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇰🇷 Korea, Republic of

Page Count

17 pages

Neural Weight Compression for Language Models

Makes AI models smaller and faster to use.

Technical Abstract

Test-Time Steering for Lossless Text Compression via Weighted Product of Experts

Coding for Computation: Efficient Compression of Neural Networks for Reconfigurable Hardware

Lossless Compression of Neural Network Components: Weights, Checkpoints, and K/V Caches in Low-Precision Formats