Viral sequences in diverse environments remain largely uncharacterized, impeding our comprehension of their genetic makeup, biological interactions, and potential applications. This underscores an urgent need for innovative analytical methods. Here, we present the VirHost Hunter framework, which employs phage tails and lysins, bypassing the requirement for full genomes, for efficient and high-resolution host assignment. By harnessing Protein Language Models and Vision Transformers, VirHost Hunter captures protein functional homology despite sequence dissimilarity, significantly boosting prediction accuracy. In the scenario of disease-associated gut bacteria, the calibrated VirHost Hunter surpasses existing methods, doubling phage host assignments, expanding taxonomic reach, and revealing previously uncharacterized phages targeting gut bacteria, including Akkermansia and Prevotella. Therefore, we establish a gut phage lysin database, enabling the synthesis of a lysin that effectively and specifically targets an obesity-promoting bacterium. VirHost Hunter’s precision and scalability mark a significant leap forward in virome research and present a promising avenue for microbiome therapies. Here, the authors present VirHost Hunter, an AI-based approach to phage–host assignment using tail and lysin proteins, showing it improves host resolution, expands functional discovery of gut phage, and enables targeted lysin identification for microbiome research.
Du et al. (Fri,) studied this question.