Potential Business Impact:
Find words in text super fast.
Let $T [1..n]$ be a text over an alphabet of size $\sigma \in \mathrm{polylog} (n)$, let $r^*$ be the sum of the numbers of runs in the Burrows-Wheeler Transforms of $T$ and its reverse, and let $z$ be the number of phrases in the LZ77 parse of $T$. We show how to store $T$ in $O (r^* \log (n / r^*) + z \log n)$ bits such that, given a pattern $P [1..m]$, we can report the locations of the $\mathrm{occ}$ occurrences of $P$ in $T$ in $O (m \log n + \mathrm{occ} \log^\epsilon n)$ time. We can also report the position of the leftmost and rightmost occurrences of $P$ in $T$ in the same space and $O (m \log^\epsilon n)$ time.
Similar Papers
Compressed Dictionary Matching on Run-Length Encoded Strings
Data Structures and Algorithms
Finds words in zipped text faster.
Dynamic r-index: An Updatable Self-Index for Highly Repetitive Strings
Data Structures and Algorithms
Lets computers find words in changing text fast.
Merging RLBWTs adaptively
Data Structures and Algorithms
Merges compressed text data much faster.