Score: 1

The LZ78 Source

Published: March 13, 2025 | arXiv ID: 2503.10574v2

By: Naomi Sagan, Amir Dembo, Tsachy Weissman

BigTech Affiliations: Stanford University

Potential Business Impact:

Helps computers learn from data that changes.

Business Areas:
Semantic Web Internet Services

We study a family of processes generated according to sequential probability assignments induced by the LZ78 universal compressor. We characterize entropic and distributional properties such as their entropy and relative entropy rates, finite-state compressibility and log loss of their realizations, and the empirical distributions that they induce. Though not quite stationary, these sources are "almost stationary and ergodic;" similar to stationary and ergodic processes, they satisfy a Shannon-McMillan-Breiman-type property: the normalized log probability of their realizations converges almost surely to their entropy rate. Further, they are locally "almost i.i.d." in the sense that the finite-dimensional empirical distributions of their realizations converge almost surely to a deterministic i.i.d. law. However, unlike stationary ergodic sources, the finite-state compressibility of their realizations is almost surely strictly larger than their entropy rate by a "Jensen gap." We present simulations demonstrating the theoretical results. Among their potential uses, these sources allow to gauge the performance of sequential probability models on non-Markovian non-stationary data.

Country of Origin
🇺🇸 United States

Page Count
34 pages

Category
Computer Science:
Information Theory