Score: 0

Online computation of normalized substring complexity

Published: October 18, 2025 | arXiv ID: 2510.16454v1

By: Gregory Kucherov, Yakov Nekrich

Potential Business Impact:

Finds patterns in text super fast.

Business Areas:
Text Analytics Data and Analytics, Software

The normalized substring complexity $\delta$ of a string is defined as $\max_k \{c[k]/k\}$, where $c[k]$ is the number of \textit{distinct} substrings of length $k$. This simply defined measure has recently attracted attention due to its established relationship to popular string compression algorithms. We consider the problem of computing $\delta$ online, when the string is provided from a stream. We present two algorithms solving the problem: one working in $O(\log n)$ amortized time per character, and the other in $O(\log^3 n)$ worst-case time per character. To our knowledge, this is the first polylog-time online solution to this problem.

Country of Origin
🇺🇸 United States

Page Count
15 pages

Category
Computer Science:
Data Structures and Algorithms