Score: 1

LZ Penalty: An information-theoretic repetition penalty for autoregressive language models

Published: April 28, 2025 | arXiv ID: 2504.20131v3

By: Antonio A. Ginart , Naveen Kodali , Jason Lee and more

BigTech Affiliations: Salesforce Research

Potential Business Impact:

Stops AI from repeating itself when writing.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We introduce the LZ penalty, a penalty specialized for reducing degenerate repetitions in autoregressive language models without loss of capability. The penalty is based on the codelengths in the LZ77 universal lossless compression algorithm. Through the lens of the prediction-compression duality, decoding the LZ penalty has the interpretation of sampling from the residual distribution after removing the information that is highly compressible. We demonstrate the LZ penalty enables state-of-the-art open-source reasoning models to operate with greedy (temperature zero) decoding without loss of capability and without instances of degenerate repetition. Both the industry-standard frequency penalty and repetition penalty are ineffective, incurring degenerate repetition rates of up to 4%.

LZD-style Compression Scheme with Truncation and Repetitions

Data Structures and Algorithms

Makes files smaller, faster, and better.

2 May 2025 1

84%

Enhancing Large Language Model Efficiencyvia Symbolic Compression: A Formal Approach Towards Interpretability

Artificial Intelligence

Makes AI understand code and logic better, cheaper.

30 Jan 2025 0

84%

zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression

Computation and Language

Makes computer language models faster and cheaper.

1 Jun 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

14 pages

LZ Penalty: An information-theoretic repetition penalty for autoregressive language models

Stops AI from repeating itself when writing.

Technical Abstract

LZD-style Compression Scheme with Truncation and Repetitions

Enhancing Large Language Model Efficiencyvia Symbolic Compression: A Formal Approach Towards Interpretability

zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression