On the near-tightness of $χ\leq 2r$: a general $σ$-ary construction and a binary case via LFSRs
By: Vinicius T. V. Date, Leandro M. Zatesko
In the field of compressed string indexes, recent work has introduced suffixient sets and their corresponding repetitiveness measure $χ$. In particular, researchers have explored its relationship to other repetitiveness measures, notably $r$, the number of runs in the Burrows--Wheeler Transform (BWT) of a string. Navarro et al. (2025) proved that $χ\leq 2r$, although empirical results by Cenzato et al. (2024) suggest that this bound is loose, with real data bounding $χ$ by around $1.13r$ to $1.33r$ when the size of the alphabet is $σ= 4$. To better understand this gap, we present two cases for the asymptotic tightness of the $χ\leq 2r$ bound: a general construction for arbitrary $σ$ values, and a binary alphabet case, consisting of de Bruijn sequences constructed by linear-feedback shift registers (LFSRs) from primitive polynomials over $\mathbb{F}_2$. The second is a novel characterization of which de Bruijn sequences achieve the literature run-minimal pattern for the cyclic BWT. Moreover, we show that de Bruijn sequences fail to close the gap for $σ\geq 3$.
Similar Papers
r*-indexing
Data Structures and Algorithms
Find words in text super fast.
Merging RLBWTs adaptively
Data Structures and Algorithms
Merges compressed text data much faster.
R-enum Revisited: Speedup and Extension for Context-Sensitive Repeats and Net Frequencies
Data Structures and Algorithms
Finds patterns in text faster and with less memory.