Key points are not available for this paper at this time.
Metagenomics is a rapidly expanding field that uses next-generation sequencing technology to analyze the genetic makeup of environmental samples. However, accurately identifying the organisms in a metagenomic sample can be complex, and traditional reference-based methods may need to be more effective in some instances. In this study, we present a novel approach for metagenomic identification, using data compressors as a feature for taxonomic classification. By evaluating a comprehensive set of compressors, including both general-purpose and genomic-specific, we demonstrate the effectiveness of this method in accurately identifying organisms in metagenomic samples. The results indicate that using features from multiple compressors can help identify taxonomy. An overall accuracy of 95% was achieved using this method using an imbalanced dataset with classes with limited samples. The study also showed that the correlation between compression and classification is insignificant, highlighting the need for a multi-faceted approach to metagenomic identification. This approach offers a significant advancement in the field of metagenomics, providing a reference-less method for taxonomic identification that is both effective and efficient while revealing insights into the statistical and algorithmic nature of genomic data. The code to validate this study is publicly available at https://github.com/ieeta-pt/xgTaxonomy.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jorge Miguel Silva
João Rafael Almeida
Artificial Intelligence in Medicine
University of Aveiro
Building similarity graph...
Analyzing shared references across papers
Loading...
Silva et al. (Wed,) studied this question.
synapsesocial.com/papers/68e5c453b6db64358755a987 — DOI: https://doi.org/10.1016/j.artmed.2024.102948