Score: 0

Unclustered BWTs of any Length over Non-Binary Alphabets

Published: August 28, 2025 | arXiv ID: 2508.20879v1

By: Gabriele Fici , Estéban Gabory , Giuseppe Romana and more

Potential Business Impact:

Finds longest possible patterns in text data.

Business Areas:
A/B Testing Data and Analytics

We prove that for every integer $n > 0$ and for every alphabet $\Sigma_k$ of size $k \geq 3$, there exists a necklace of length $n$ whose Burrows-Wheeler Transform (BWT) is completely unclustered, i.e., it consists of exactly $n$ runs with no two consecutive equal symbols. These words represent the worst-case behavior of the BWT for clustering, since the number of BWT runs is maximized. We also establish a lower bound on their number. This contrasts with the binary case, where the existence of infinitely many completely unclustered BWTs is still an open problem, related to Artin's conjecture on primitive roots.

Page Count
15 pages

Category
Computer Science:
Discrete Mathematics