Complete asymptotic type-token relationship for growing complex systems with inverse power-law count rankings
By: Pablo Rosillo-Rodes, Laurent Hébert-Dufresne, Peter Sheridan Dodds
Potential Business Impact:
Explains how word counts predict new words.
The growth dynamics of complex systems often exhibit statistical regularities involving power-law relationships. For real finite complex systems formed by countable tokens (animals, words) as instances of distinct types (species, dictionary entries), an inverse power-law scaling $S \sim r^{-\alpha}$ between type count $S$ and type rank $r$, widely known as Zipf's law, is widely observed to varying degrees of fidelity. A secondary, summary relationship is Heaps' law, which states that the number of types scales sublinearly with the total number of observed tokens present in a growing system. Here, we propose an idealized model of a growing system that (1) deterministically produces arbitrary inverse power-law count rankings for types, and (2) allows us to determine the exact asymptotics of the type-token relationship. Our argument improves upon and remedies earlier work. We obtain a unified asymptotic expression for all values of $\alpha$, which corrects the special cases of $\alpha = 1$ and $\alpha \gg 1$. Our approach relies solely on the form of count rankings, avoids unnecessary approximations, and does not involve any stochastic mechanisms or sampling processes. We thereby demonstrate that a general type-token relationship arises solely as a consequence of Zipf's law.
Similar Papers
Quadratic Term Correction on Heaps' Law
Computation and Language
Makes computer language models understand words better.
Sub-exponential Growth of New Words and Names Online: A Piecewise Power-Law Model
Physics and Society
Explains how ideas spread slower than expected.
From Zipf's Law to Neural Scaling through Heaps' Law and Hilberg's Hypothesis
Information Theory
Makes AI understand language better by finding patterns.