Score: 0

r*-indexing

Published: August 18, 2025 | arXiv ID: 2508.12675v1

By: Travis Gagie

Potential Business Impact:

Find words in text super fast.

Let $T [1..n]$ be a text over an alphabet of size $\sigma \in \mathrm{polylog} (n)$, let $r^*$ be the sum of the numbers of runs in the Burrows-Wheeler Transforms of $T$ and its reverse, and let $z$ be the number of phrases in the LZ77 parse of $T$. We show how to store $T$ in $O (r^* \log (n / r^*) + z \log n)$ bits such that, given a pattern $P [1..m]$, we can report the locations of the $\mathrm{occ}$ occurrences of $P$ in $T$ in $O (m \log n + \mathrm{occ} \log^\epsilon n)$ time. We can also report the position of the leftmost and rightmost occurrences of $P$ in $T$ in the same space and $O (m \log^\epsilon n)$ time.