The bacterial pangenome contains a vast diversity of antiphage systems, whose overall extent is still unknown. In this study, we developed complementary machine learning approaches to systematically predict antiphage function from genomic context, protein sequence, or their combination, achieving up to 99% precision and 92% recall. We validated these models experimentally in Escherichia and Streptomyces with the discovery of 12 antiphage systems. Applied to over 32,000 bacterial genomes, these models expand the predicted antiphage repertoire, with ~1.5% of bacterial genomes devoted to defense and more than 85% of predicted protein families remaining uncharacterized. We provide an interactive catalog of more than 19,000 candidate operon families for experimental follow-up. Together, these findings show that most molecular diversity in bacterial immunity remains uncharacterized and provide a foundation for its systematic exploration.
Mordret et al. (Thu,) studied this question.