Score: 0

Zipf Distributions from Two-Stage Symbolic Processes: Stability Under Stochastic Lexical Filtering

Published: November 26, 2025 | arXiv ID: 2511.21060v1

By: Vladimir Berman

Potential Business Impact:

Explains why some words are common, others rare.

Business Areas:
A/B Testing Data and Analytics

Zipf's law in language lacks a definitive origin, debated across fields. This study explains Zipf-like behavior using geometric mechanisms without linguistic elements. The Full Combinatorial Word Model (FCWM) forms words from a finite alphabet, generating a geometric distribution of word lengths. Interacting exponential forces yield a power-law rank-frequency curve, determined by alphabet size and blank symbol probability. Simulations support predictions, matching English, Russian, and mixed-genre data. The symbolic model suggests Zipf-type laws arise from geometric constraints, not communicative efficiency.

Page Count
16 pages

Category
Statistics:
Methodology