Abstract Microorganisms in extreme environments represent a promising source of novel metabolites, yet their global diversity and biosynthetic potential remain underexplored. Here, we reconstruct 78,213 bacterial and archaeal genomes from 2293 publicly available metagenomes and 3214 microbial isolates to establish a unified database, the Extreme Environment Microbiome Catalog (EEMC). The EEMC expands known global phylogenetic diversity, encompassing 32,715 representative species and nearly 4 billion non-redundant genes, 63.00% and 19.21% of which are previously unannotated, respectively. It also comprises 163,693 biosynthetic gene clusters, grouped into 64,733 gene cluster families, 58.68% of which are classified as novel, underscoring the functional diversity of microbial communities across various extreme habitats. We further develop protein large language models to predict genome-encoded candidate antimicrobial peptides (cAMPs) from the EEMC, identifying 3032 non-toxic candidates. Of 100 synthesized peptides, 84% demonstrate antibacterial activity, and all 50 tested cAMPs exhibit low cytotoxicity. Notably, six of the most potent cAMPs show significant efficacy against multidrug-resistant, Gram-negative pathogens in vitro, indicating their biomedical potential. Together, our study establishes the EEMC as a foundational resource for uncovering novel microbial lineages and biosynthetic capabilities, highlighting its substantial potential for drug discovery and laying the foundation for future advances in biotechnology and biomedicine.
Jiang et al. (Thu,) studied this question.