DNA contigs binning is necessary to reconstruct metagenome-assembled genomes. Current metagenomic DNA contigs binning methods often leverage coverage profiles across multiple related metagenomes and have demonstrated strong performance on co-assembled contigs. However, in single-sample scenarios where coverage information is rare, their performance drops significantly, limiting the in-depth development of metagenomics at the individual sample level. To address this issue, we propose DCVBin, a novel single-sample metagenomic contigs binning method that incorporates semantic features extracted from a DNA language model. Specifically, our approach continues pretraining on a DNA language model to capture more domain-specific semantic representations, which are then integrated with 4-mer frequencies using a variational autoencoder. Clustering is subsequently performed using the k-means algorithm, in which the number of clusters is determined by single copy genes. Experimental results on six publicly available datasets demonstrate that DCVBin achieves high-accuracy single-sample metagenomic binning and outperforms other state-of-the-art methods. Furthermore, DCVBin is included into a disease diagnostic framework that is evaluated on a cohort of gut metagenomes from people with colorectal cancer and healthy people. The framework is shown to be accurate in predicting colorectal cancer using gut metagenomes and has identified a list of potential microbial biomarkers.
Building similarity graph...
Analyzing shared references across papers
Loading...
Wáng et al. (Fri,) studied this question.
synapsesocial.com/papers/6a0ea17cbe05d6e3efb60294 — DOI: https://doi.org/10.1093/bib/bbag241
Yì Wáng
University of Stuttgart
Yifan Liu
Jilin University
F M Liu
Union Hospital
Briefings in Bioinformatics
Jilin University
Jilin Jianzhu University
Jilin Engineering Normal University
Building similarity graph...
Analyzing shared references across papers
Loading...