Key points are not available for this paper at this time.
Abstract Background Genome contamination is a well-known issue in (meta)genomics. Although it has received a lot of attention, with an increasing number of detection tools made available over the years, no comparison between these tools exists in the literature. Results Here, we report the benchmarking of six of the most popular tools using a simulated framework. Our simulations were conducted on six different taxonomic ranks, from phylum to species. The analysis of the estimated contamination levels indicates that the precision of the tools is not good, often due to large overdetection but also underdetection, especially at the genus and species ranks. Furthermore, our results show that only redundant contamination is accurately estimated. Conclusion Our results indicate that using a combination of tools, including Kraken2, is necessary to estimate the contamination level accurately. We also provide a freely available contamination simulation framework, CRACOT, which may be useful for estimating the accuracy of future algorithms.
Cornet et al. (Wed,) studied this question.