Dynamic r-index: An Updatable Self-Index for Highly Repetitive Strings
By: Takaaki Nishimoto, Yasuo Tabei
Potential Business Impact:
Lets computers find words in changing text fast.
A self-index is a compressed data structure that supports locate queries-reporting all positions where a given pattern occurs in a string. While many self-indexes have been proposed, developing dynamically updatable ones supporting string insertions and deletions remains a challenge. The r-index (Gagie et al., SODA'18) is a representative static self-index based on the run-length Burrows-Wheeler transform (RLBWT), designed for highly repetitive strings - those with many repeated substrings. We present the dynamic r-index, an extension of the r-index that supports locate queries in $\mathcal{O}((m + \mathsf{occ}) \log n)$ time using $\mathcal{O}(r)$ words, where $n$ is the length of the string $T$, $m$ is the pattern length, $\mathsf{occ}$ is the number of occurrences, and $r$ is the number of runs in the RLBWT of $T$. It supports string insertions and deletions in $\mathcal{O}((m + L_{\mathsf{max}}) \log n)$ time, where $L_{\max}$ is the maximum value in the LCP array of $T$. The average running time is $\mathcal{O}((m + L_{\mathsf{avg}}) \log n)$, where $L_{\mathsf{avg}}$ is the average LCP value. We experimentally evaluated the dynamic r-index on various highly repetitive strings and demonstrated its practicality.
Similar Papers
r*-indexing
Data Structures and Algorithms
Find words in text super fast.
Compressed Dictionary Matching on Run-Length Encoded Strings
Data Structures and Algorithms
Finds words in zipped text faster.
R-enum Revisited: Speedup and Extension for Context-Sensitive Repeats and Net Frequencies
Data Structures and Algorithms
Finds patterns in text faster and with less memory.