May 31, 2024Open Access

Sequencing Coverage Analysis for Combinatorial DNA-Based Storage Systems

Key Points

Key points are not available for this paper at this time.

Abstract

This study introduces a novel model for analyzing and determining the required sequencing coverage in DNA-based data storage, focusing on combinatorial DNA encoding. We seek to characterize the distribution of the number of sequencing reads required for message reconstruction. We use a variant of the coupon collector distribution for this purpose. For any given number of observed reads, R ϵ N, we use a Markov Chain representation of the process to compute the probability of error-free reconstruction. We develop theoretical bounds on the decoding probability and use empirical simulations to validate these bounds and assess tightness. This work contributes to understanding sequencing coverage in DNA-based data storage, offering insights into decoding complexity, error correction, and sequence reconstruction. We provide a Python package, with its input being the code design and other message parameters, all of which are denoted as Θ, and a desired confidence level 1-δ. This package computes the required read coverage, guaranteeing the message reconstruction R=R(δ,Θ).

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Preuss et al. (Fri,) studied this question.

synapsesocial.com/papers/68e67624b6db64358760057d https://doi.org/https://doi.org/10.1109/tmbmc.2024.3408053

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper