Merging RLBWTs adaptively
By: Travis Gagie
Potential Business Impact:
Merges compressed text data much faster.
We show how to merge run-length compressed Burrows-Wheeler Transforms (RLBWTs) quickly and in $O (R)$ space, where $R$ is the total number of runs in them, when a certain parameter is small. Specifically, we consider the boundaries in their combined extended Burrows-Wheeler Transform (eBWT) between blocks of characters from the same original RLBWT, and denote by $L$ the sum of the longest common prefix (LCP) values at those boundaries. We show how to merge the RLBWTs in $\tilde{O} (L + σ+ R)$ time, where $σ$ is the alphabet size. We conjecture that $L$ tends to be small when the strings (or sets of strings) underlying the original RLBWTs are repetitive but dissimilar.
Similar Papers
Decomposing Words for Enhanced Compression: Exploring the Number of Runs in the Extended Burrows-Wheeler Transform
Data Structures and Algorithms
Finds better ways to shrink computer files.
Fast and memory-efficient BWT construction of repetitive texts using Lyndon grammars
Data Structures and Algorithms
Finds patterns in huge data faster.
Unclustered BWTs of any Length over Non-Binary Alphabets
Discrete Mathematics
Finds longest possible patterns in text data.