What question did this study set out to answer?

The aim is to develop and enhance computational tools for protein binder design using machine learning techniques.

April 17, 2026Open Access

Machine learning approaches for protein binder design

Key Points

The aim is to develop and enhance computational tools for protein binder design using machine learning techniques.
Developed an HPC-enabled dynamic pipeline of pre-trained foundation models for iterative protein optimization.
Examined sequence diversity on a viral protein interface using a structure-aware graph classifier.
Implemented a sequence-based contrastive learning network to predict PDZ-peptide binding affinities.
Generated high-quality geometric data through a full structure prediction pipeline.
Achieved measurably improved design quality compared to baseline techniques.
Demonstrated excellent generalization in predicting out-of-distribution data using the graph classifier.
Showcased significant predictive power for binder activity through the contrastive learning framework.

Abstract

Computational protein design has become a critical area of research in recent years. With the advent of deep learning, several foundational models have emerged which provide researchers with unprecedented intuition into deciphering the relationship between sequence and function. As more advanced neural networks are continually being developed, we must examine how these tools can be leveraged to engineer novel and effective design protocols. Furthermore, a thorough examination of how previous architectures can apply to unexplored problems in protein science is necessary to harness the full potential of machine learning. Here, we present our efforts to contribute to the development of computational protein design tools. In chapter one, we develop an HPC-enabled dynamic pipeline of pre-trained foundation models. This framework facilitates the iterative cycling and gradual optimization of proteins as they converge on high quality therapeutic binders. Our method achieves measurably improved design quality over baseline techniques, and showcases how dynamic compute resource allocation can improve the efficiency of functional landscape traversal. In chapter two, we use these large models to examine the allowable sequence diversity along one side of a viral protein interface. We sample functional variants at unprecedented mutational depth, and use this data to train a structure-aware graph classifier. Our model achieves excellent generalization to out-of-distribution data, allowing for distal variant effect forecasting. Chapter three focuses on the development of a sequence-based contrastive learning network tasked with learning the joint latent space which indicates if a given PDZ-peptide pair will bind. This framework achieves excellent results, exhibits significant generalizability, and is readily transferable to a diverse suite of alternate protein binding systems. Across all datasets, our method achieves comparable performance to state-of-the-art prediction techniques. We further push the contrastive model by applying it to both designed domains and peptides for rapid candidate screening. Binders which are viewed positively by the network showcase significant activity against their targets experimentally, further corroborating its strong predictive power. Finally, we highlight how structure-based graphs allow for similar recapitulation of protein binding data in chapter four. Here, we showcase our full structure prediction pipeline to generate high-quality geometric data to be used in downstream tasks. While our graph model exhibits robust performance on held out validation sets, it still falls short of the contrastive framework’s levels of generalizability. We then propose a number of future avenues to explore which would more fully harness the information-rich structural representations of protein binder samples to empower prediction.

Bookmark

View Full Paper

Bookmark

View Full Paper

Machine learning approaches for protein binder design

Key Points

Abstract

Cite This Study