Online computation of normalized substring complexity
By: Gregory Kucherov, Yakov Nekrich
Potential Business Impact:
Finds patterns in text super fast.
The normalized substring complexity $\delta$ of a string is defined as $\max_k \{c[k]/k\}$, where $c[k]$ is the number of \textit{distinct} substrings of length $k$. This simply defined measure has recently attracted attention due to its established relationship to popular string compression algorithms. We consider the problem of computing $\delta$ online, when the string is provided from a stream. We present two algorithms solving the problem: one working in $O(\log n)$ amortized time per character, and the other in $O(\log^3 n)$ worst-case time per character. To our knowledge, this is the first polylog-time online solution to this problem.
Similar Papers
Expected Length of the Longest Common Subsequence of Multiple Strings
Combinatorics
Finds patterns in random text faster.
Tight Lower Bounds for Central String Queries in Compressed Space
Data Structures and Algorithms
Find text faster using less computer memory.
Improved Online Sorting
Data Structures and Algorithms
Sorts numbers faster, saving space and time.