Abstract Background: Accurate prediction of CD4 T cell epitopes is essential for vaccine design and immunotherapy development but remains challenging due to MHC class II polymorphism and the complexity of antigen presentation. We present a foundation-model-based framework that integrates structural representations from AlphaFold 3 (AF3) and sequence embeddings from ESM2 within a graph neural network (GNN) to predict peptide:MHC II binding. Model and Experimental Procedures: For each peptide-MHC II complex, AF3 was used to generate a 3D structural model, from which we constructed a residue-level graph where edges represent geometric proximity. Node features combined AF3-derived spatial descriptors with contextual embeddings from ESM2 corresponding to each amino-acid token in the peptide sequence. The hybrid GNN was trained on experimental data from a multiplexed MHCII-PepSeq assay, a high-throughput platform that directly measures binding of thousands of synthetic peptides across diverse MHC class II molecules (∼52,000 non-binders and 633 binders). The model was evaluated on an independent held-out test set encompassing multiple HLA alleles. Results: The AF3 + ESM2 GNN achieved an AUC of 0.782 (95% CI: 0.751-0.813), matching exactly the performance of NetMHCIIpan 4.3, a leading model for peptide:MHC II binding prediction on the Immune Epitope Database (IEDB) benchmark—despite being trained on a dataset that is orders of magnitude smaller. A combined ensemble of NetMHCIIpan with the AF3 + ESM2 model further improved performance, reaching an AUC of 0.810. The AF3-only GNN yielded an AUC of 0.775, indicating that ESM2 sequence embeddings may contribute complementary contextual information that enhances prediction accuracy. Conclusions: By unifying structure- and sequence-based protein foundation models, our approach achieves state-of-the-art, data-efficient prediction of peptide:MHC II binding. Comparable in accuracy to the gold-standard NetMHCIIpan while trained on orders-of-magnitude smaller datasets, this framework enables scalable and interpretable modeling of antigen presentation. Such structure-informed prediction can accelerate the discovery of CD4 T-cell epitopes relevant to neoantigen identification, vaccine development, and TCR/CAR-T engineering, where precise understanding of peptide:MHC recognition is essential. More broadly, our findings highlight the potential of foundation models to bridge molecular immunology and therapeutic design by providing generalizable, low-data solutions to complex antigen-presentation problems. Citation Format: Kamel Lahouel, Mete Mulazimoglu, Jorge Soria-Bustos, Kameron Bates, Erin Kelley, Lawson Woods, Kunjur Manasa Upadhyaya, Gonzalo J. Acevedo, Sophie Pénisson, Matteo Munini, Ehsan Variani, Margaret E. Feeney, John A. Altin, Cristian Tomasetti. Structure-sequence integration for peptide:MHC class II binding prediction using AI foundation models (AlphaFold 3 and ESM2) abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 1298.
Lahouel et al. (Fri,) studied this question.