Abstract Shotgun metagenomics can provide both taxonomic and functional insights, but benchmarking is necessary to determine the sequencing depth appropriate for specific analyses. Here we used complex mixtures of DNA from cultured bacteria and analysed taxonomic composition, strain-level resolution and functional profiles at up to 11 sequencing depths (0.1–50.0 Gb). Reference-based analysis provided accurate strain-level taxonomy at 0.5–1.0 Gb. By contrast, de novo metagenome-assembled genome (MAG) reconstruction required deep sequencing (>10 Gb), and even MAGs deemed high quality by standard metrics were chimeric, with 54.5–81.8% accurately representing original strains, depending on the bioinformatic approach. Functionally, 2 Gb provided reliable insights at the pathway level for each of the mock communities tested, but sufficient proteome coverage was achieved only at or above 10 Gb. Library preparation and host DNA contamination were identified as confounders in shallow metagenomic analysis. This analysis highlights the potential and limitations of shallow metagenomics and provides guidance to accurately capture strain-level diversity using MAGs.
Treichel et al. (Tue,) studied this question.