What question did this study set out to answer?

The aim is to develop a framework for predicting protein-protein interactions (PPIs) using advanced optimization and embedding-based methods.

May 31, 2026Open Access

A robust framework for protein-protein interaction prediction with multi-objective ensemble learning and embedding-based representations

Key Points

The aim is to develop a framework for predicting protein-protein interactions (PPIs) using advanced optimization and embedding-based methods.
A hybrid framework integrates Prot-T5-XL-Uniref-50 for protein sequence embedding.
Multi-objective non-dominated sorting genetic algorithm-II (NSGA-II) is combined with random forest for enhanced prediction.
Feature contributions analyzed using SHapley Additive exPlanations to visualize influential embedding dimensions.
The proposed method outperformed state-of-the-art approaches in PPI prediction across four benchmark datasets.
Maximized prediction accuracy with optimal classifier diversity using NSGA-II.
Quantitative feature examination enhanced model interpretability and robustness.

Abstract

Abstract Protein–protein interactions (PPIs) perform a key role in virtually all cellular processes. However, experimental identification of PPIs remains costly, time-consuming, and often incomplete. To address these challenges, this study presents a hybrid adaptive framework for PPI prediction that integrates modern protein language models with evolutionary optimization and ensemble learning. It uses the language model Prot-T5-XL-Uniref-50 to embed protein sequences, capturing rich contextual, structural, and physicochemical information. The resulting high-dimensional representations are then compressed using uniform manifold approximation and projection to reduce computational complexity. A hybrid approach coupling the multi-objective non-dominated sorting genetic algorithm-II (NSGA-II) with random forest is then proposed to enhance classifier robustness. This evolutionary strategy simultaneously maximizes prediction accuracy and classifier diversity while estimating the optimal number of trees required for the ensemble from the pareto-optimal fronts. Comparative results with state-of-the-art methods validate the superior performance of the proposed method across four benchmark datasets- Human , E. coli , Drosophila , and C. elegans . Finally, using SHapley Additive exPlanations, each feature’s contribution to the model’s predictions was quantified and visualized, facilitating the ranking and examination of influential embedding dimensions. Overall, the proposed framework offers a reliable and robust solution for large-scale PPI prediction based solely on protein sequence data.

Bookmark

View Full Paper

Bookmark

View Full Paper

A robust framework for protein-protein interaction prediction with multi-objective ensemble learning and embedding-based representations

Key Points

Abstract

Cite This Study