Antiphage defense systems protect bacteria from viral infection and have inspired important biotechnologies such as CRISPR-Cas9 while also revealing the evolutionary roots of eukaryotic innate immunity. Many systems have been discovered by genomic colocalization, but this approach cannot identify systems outside of defense islands. We present DefensePredictor, a machine learning model that uses protein language model embeddings to classify proteins as defensive. Applying DefensePredictor to 69 diverse Escherichia coli strains, we predicted hundreds of previously unknown systems and experimentally validated 42 of them. Analysis of 1000 diverse prokaryotic genomes identified nearly 3000 protein clusters lacking homology to known systems, revealing a vast, uncharacterized defense repertoire. DefensePredictor will facilitate the comprehensive discovery of antiphage defense systems, which promises to reveal additional connections between prokaryotic and eukaryotic immunity and accelerate biotechnology development.
DeWeirdt et al. (Thu,) studied this question.