Random Access in DNA Storage: Algorithms, Constructions, and Bounds
By: Chen Wang, Eitan Yaakobi
Potential Business Impact:
Stores more computer data in tiny DNA strands.
As DNA data storage moves closer to practical deployment, minimizing sequencing coverage depth is essential to reduce both operational costs and retrieval latency. This paper addresses the recently studied Random Access Problem, which evaluates the expected number of read samples required to recover a specific information strand from $n$ encoded strands. We propose a novel algorithm to compute the exact expected number of reads, achieving a computational complexity of $O(n)$ for fixed field size $q$ and information length $k$. Furthermore, we derive explicit formulas for the average and maximum expected number of reads, enabling an efficient search for optimal generator matrices under small parameters. Beyond theoretical analysis, we present new code constructions that improve the best-known upper bound from $0.8815k$ to $0.8811k$ for $k=3$, and achieve an upper bound of $0.8629k$ for $k=4$ for sufficiently large $q$. We also establish a tighter theoretical lower bound on the expected number of reads that improves upon state-of-the-art bounds. In particular, this bound establishes the optimality of the simple parity code for the case of $n=k+1$ across any alphabet $q$.
Similar Papers
Making it to First: The Random Access Problem in DNA Storage
Information Theory
Finds data faster in DNA storage.
The Random Variables of the DNA Coverage Depth Problem
Information Theory
Stores more computer information in tiny DNA.
The Coverage Depth Problem in DNA Storage Over Small Alphabets
Information Theory
Stores more computer data in tiny DNA.