Key points are not available for this paper at this time.
We propose a coding method to transform binary sequences into DNA base sequences (codewords), namely sequences of the symbols A, T, C, and G, that satisfy the following two properties: 1) run-length constraint: the maximum run-length of each symbol in each codeword is at most three and 2) GC-content constraint: the GC-content of each codeword is close to 0.5, say between 0.4 and 0.6. The proposed coding scheme is motivated by the problem of designing codes for DNA-based data storage systems, where the binary digital data is stored in synthetic DNA base sequences. Existing literature either achieve code rates not greater than 1.78 bits per nucleotide or lead to severe error propagation. Our method achieves a rate of 1.9 bits per DNA base with low encoding/ decoding complexity and limited error propagation.
Song et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: