What question did this study set out to answer?

The aim is to improve the prediction of protein-peptide binding sites by integrating sequence-structure information using a novel framework.

May 8, 2026

MGAPep: LLM-Augmented Multimodal Graph Attention for Protein-Peptide Binding Site Prediction and Cross-Domain Transfer

Key Points

The aim is to improve the prediction of protein-peptide binding sites by integrating sequence-structure information using a novel framework.
Introduced MGAPep utilizing large language model embeddings and protein descriptors with a graph attention backbone.
Applied self-supervised pre-training and task-specific fine-tuning to enhance model performance.
Conducted extensive benchmarking against baseline methods for validation.
Achieved state-of-the-art accuracy in protein-peptide binding site prediction, with effective generalization to previously unseen proteins and peptides.
Demonstrated superior performance on protein-nucleic acid binding site prediction without altering architecture.
Showed that graph-enhanced LLMs significantly improve biomolecular binding modeling outcomes.

Abstract

Protein-peptide interactions drive peptide therapeutics, precision design, and biomarker discovery, yet most predictors underuse complementary sequence-structure information. LLM-augmented multimodal approaches offer a promising solution to these limitations. We introduce MGAPep, which fuses pre-trained large language model embeddings with protein sequence and structural descriptors via a residual graph attention backbone and a multi-head dual-attention module to capture fine-grained interface patterns. Leveraging large-scale corpora of protein fragment-peptide interaction data, MGAPep employs self-supervised pre-training, transfer learning, and task-specific fine-tuning to obtain rich, transferable representations. Extensive benchmarking shows consistent state-of-the-art accuracy for protein-peptide binding site prediction, with robust generalization to unseen proteins and peptides. The framework also transfers effectively across modalities, yielding superior performance to most baselines on protein-nucleic acid binding site prediction without architecture changes, underscoring broad applicability. Together with evidence that graph-enhanced LLMs improve biomolecular binding modeling, these results establish MGAPep as a general paradigm for protein-biomolecule interaction prediction.

Bookmark

MGAPep: LLM-Augmented Multimodal Graph Attention for Protein-Peptide Binding Site Prediction and Cross-Domain Transfer

Key Points

Abstract

Cite This Study