Efficient Computation of Closed Substrings
By: Samkith K Jain, Neerja Mhaskar
Potential Business Impact:
Finds special repeating patterns in text quickly.
A closed string $u$ is either of length one or contains a border that occurs only as a prefix and as a suffix in $u$ and nowhere else within $u$. In this paper, we present a fast and practical $O(n\log n)$ time algorithm to compute all $\Theta(n^2)$ closed substrings by introducing a compact representation for all closed substrings of a string $ w[1..n]$, using only $O(n \log n)$ space. We also present a simple and space-efficient solution to compute all maximal closed substrings (MCSs) using the suffix array ($\mathsf{SA}$) and the longest common prefix ($\mathsf{LCP}$) array of $w[1..n]$. Finally, we show that the exact number of MCSs ($M(f_n)$) in a Fibonacci word $ f_n $, for $n \geq 5$, is $\approx \left(1 + \frac{1}{\phi^2}\right) F_n \approx 1.382 F_n$, where $ \phi $ is the golden ratio.
Similar Papers
$k$-Universality of Regular Languages Revisited
Formal Languages and Automata Theory
Finds shortest text containing all short word combinations.
Linear-space LCS enumeration with quadratic-time delay for two strings
Data Structures and Algorithms
Finds shared patterns in long texts faster.
Tight Lower Bounds for Central String Queries in Compressed Space
Data Structures and Algorithms
Find text faster using less computer memory.