Score: 0

Lossless Compression for LLM Tensor Incremental Snapshots

Published: May 14, 2025 | arXiv ID: 2505.09810v1

By: Daniel Waddington, Cornel Constantinescu

Potential Business Impact:

Makes AI training faster by shrinking data.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

During the training of Large Language Models (LLMs), tensor data is periodically "checkpointed" to persistent storage to allow recovery of work done in the event of failure. The volume of data that must be copied during each checkpoint, even when using reduced-precision representations such as bfloat16, often reaches hundreds of gigabytes. Furthermore, the data must be moved across a network and written to a storage system before the next epoch occurs. With a view to ultimately building an optimized checkpointing solution, this paper presents experimental analysis of checkpoint data used to derive a design that maximizes the use of lossless compression to reduce the volume of data. We examine how tensor data and its compressibility evolve during model training and evaluate the efficacy of existing common off-the-shelf general purpose compression engines combined with known data optimization techniques such as byte-grouping and incremental delta compression. Leveraging our analysis we have built an effective compression solution, known as Language Model Compressor (LMC), which is based on byte-grouping and Huffman encoding. LMC offers more compression performance than the best alternative (BZ2) but with an order-of-magnitude reduction in the time needed to perform the compression. We show that a 16-core parallel implementation of LMC can attain compression and decompression throughput of 2.78 GiB/s and 3.76 GiB/s respectively. This increase in performance ultimately reduces the CPU resources needed and provides more time to copy the data to the storage system before the next epoch thus allowing for higher-frequency checkpoints.

Lossless Compression of Large Language Model-Generated Text via Next-Token Prediction

Machine Learning (CS)

Makes computer text smaller without losing information.

7 May 2025 2

89%

Towards Efficient LLM Storage Reduction via Tensor Deduplication and Delta Compression

Databases

Saves space by shrinking computer language models.

30 Apr 2025 1

88%

Compression Laws for Large Language Models

Computation and Language

Makes big AI models smaller and faster.

6 Apr 2025 1

View PDF Login to Bookmark

Page Count

14 pages

Lossless Compression for LLM Tensor Incremental Snapshots

Makes AI training faster by shrinking data.

Technical Abstract

Lossless Compression of Large Language Model-Generated Text via Next-Token Prediction

Towards Efficient LLM Storage Reduction via Tensor Deduplication and Delta Compression

Compression Laws for Large Language Models