Unclustered BWTs of any Length over Non-Binary Alphabets
By: Gabriele Fici , Estéban Gabory , Giuseppe Romana and more
Potential Business Impact:
Finds longest possible patterns in text data.
We prove that for every integer $n > 0$ and for every alphabet $\Sigma_k$ of size $k \geq 3$, there exists a necklace of length $n$ whose Burrows-Wheeler Transform (BWT) is completely unclustered, i.e., it consists of exactly $n$ runs with no two consecutive equal symbols. These words represent the worst-case behavior of the BWT for clustering, since the number of BWT runs is maximized. We also establish a lower bound on their number. This contrasts with the binary case, where the existence of infinitely many completely unclustered BWTs is still an open problem, related to Artin's conjecture on primitive roots.
Similar Papers
Decomposing Words for Enhanced Compression: Exploring the Number of Runs in the Extended Burrows-Wheeler Transform
Data Structures and Algorithms
Finds better ways to shrink computer files.
Merging RLBWTs adaptively
Data Structures and Algorithms
Merges compressed text data much faster.
Fast and memory-efficient BWT construction of repetitive texts using Lyndon grammars
Data Structures and Algorithms
Finds patterns in huge data faster.