Block Length Gain for Nanopore Channels
By: Yu-Ting Lin, Hsin-Po Wang, Venkatesan Guruswami
Potential Business Impact:
Stores more computer data safely in DNA.
DNA is an attractive candidate for data storage. Its millennial durability and nanometer scale offer exceptional data density and longevity. Its relevance to medical applications also drives advances in DNA-related biotechnology. To protect our data against errors, a straightforward approach uses one error-correcting code per DNA strand, with a Reed--Solomon code protecting the collection of strands. A downside is that current technology can only synthesize strands 200--300 nucleotides long. At this block length, the inner code rate suffers a significant finite-length penalty, making its effective capacity hard to characterize. Last year, we proposed $\textit{Geno-Weaving}$ in a JSAIT publication. The idea is to protect the same position across multiple strands using one code; this provably achieves capacity against substitution errors. In this paper, we extend the idea to combat deletion errors and show two more advantages of Geno-Weaving: (1) Because the number of strands is 3--4 orders of magnitude larger than the strand length, the finite-length penalty vanishes. (2) At realistic deletion rates $0.1\%$--$10\%$, Geno-Weaving designed for BSCs works well empirically, bypassing the need to tailor the design for deletion channels.
Similar Papers
Achievable Rates of Nanopore-based DNA Storage
Information Theory
Stores lots of data in tiny DNA strands.
Complex DNA Synthesis Sequences
Information Theory
Stores way more information in tiny DNA bits.
Capacity-Achieving Codes for Noisy Insertion Channels
Information Theory
Stores more computer data in tiny DNA.