Score: 0

From Zipf's Law to Neural Scaling through Heaps' Law and Hilberg's Hypothesis

Published: December 15, 2025 | arXiv ID: 2512.13491v1

By: Łukasz Dębowski

We inspect the deductive connection between the neural scaling law and Zipf's law -- two statements discussed in machine learning and quantitative linguistics. The neural scaling law describes how the cross entropy rate of a foundation model -- such as a large language model -- changes with respect to the amount of training tokens, parameters, and compute. By contrast, Zipf's law posits that the distribution of tokens exhibits a power law tail. Whereas similar claims have been made in more specific settings, we show that the neural scaling law is a consequence of Zipf's law under certain broad assumptions that we reveal systematically. The derivation steps are as follows: We derive Heaps' law on the vocabulary growth from Zipf's law, Hilberg's hypothesis on the entropy scaling from Heaps' law, and the neural scaling from Hilberg's hypothesis. We illustrate these inference steps by a toy example of the Santa Fe process that satisfies all the four statistical laws.

Learning curves theory for hierarchically compositional data with power-law distributed features

Machine Learning (Stat)

Makes AI learn faster by understanding how things are built.

11 May 2025 3

88%

Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

Artificial Intelligence

Explains how computers learn and sometimes make mistakes.

13 Apr 2025 0

88%

Superposition Yields Robust Neural Scaling

Machine Learning (CS)

Makes computers learn better with less data.

15 May 2025 2

View PDF Login to Bookmark

From Zipf's Law to Neural Scaling through Heaps' Law and Hilberg's Hypothesis

Technical Abstract

Learning curves theory for hierarchically compositional data with power-law distributed features

Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

Superposition Yields Robust Neural Scaling