Synthetic DNA is a durable, high-density information storage platform based on DNA nanostructures. However, errors during DNA reading pose challenges to data integrity. Conventional error-correcting codes add redundancy during encoding to ensure data integrity, thereby reducing storage density and increasing costs. Here, we present an integrated error correction (IEC) algorithm that synergistically combines three enhanced mechanisms: the "head-tail" region Levenshtein distance for error-tolerant clustering (10× faster); sliding window-optimized Hamming distance for error detection and correction of insertions and deletions without length constraints; and score-weighted majority voting for optimal sequence selection (2% higher accuracy), collectively enhancing storage density and decoding efficiency. We confirmed the effectiveness of IEC by recovering medical data encoded in DNA with errors. With IEC, we can simultaneously correct insertion, deletion, and substitution errors with a redundancy rate of 2.4%, while the current minimum redundancy rate is 7%. We thus achieved a logical density of 1.4 bits per nucleotide. Additionally, IEC ensures optimal fidelity during decoding, closely matching the encoded sequences, resulting in a reduction of the number of sequences by 3 orders of magnitude, minimizing computational overhead and runtime complexities, and enhancing decoding efficiency.
Mao et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: