Score: 1

Random Access in DNA Storage: Algorithms, Constructions, and Bounds

Published: January 11, 2026 | arXiv ID: 2601.07053v1

By: Chen Wang, Eitan Yaakobi

Potential Business Impact:

Stores more computer data in tiny DNA strands.

Business Areas:

Bioinformatics Biotechnology, Data and Analytics, Science and Engineering

As DNA data storage moves closer to practical deployment, minimizing sequencing coverage depth is essential to reduce both operational costs and retrieval latency. This paper addresses the recently studied Random Access Problem, which evaluates the expected number of read samples required to recover a specific information strand from $n$ encoded strands. We propose a novel algorithm to compute the exact expected number of reads, achieving a computational complexity of $O(n)$ for fixed field size $q$ and information length $k$. Furthermore, we derive explicit formulas for the average and maximum expected number of reads, enabling an efficient search for optimal generator matrices under small parameters. Beyond theoretical analysis, we present new code constructions that improve the best-known upper bound from $0.8815k$ to $0.8811k$ for $k=3$, and achieve an upper bound of $0.8629k$ for $k=4$ for sufficiently large $q$. We also establish a tighter theoretical lower bound on the expected number of reads that improves upon state-of-the-art bounds. In particular, this bound establishes the optimality of the simple parity code for the case of $n=k+1$ across any alphabet $q$.