Peptide binders, serving as a critical drug modality bridging small-molecule compounds and protein macromolecules, can effectively mimic the secondary structural elements of natural proteins. Peptides exhibit unique physicochemical advantages when targeting protein protein interaction (PPI) interfaces, which are typically characterized by flat surfaces and extensive contact areas. Recently, diffusion models represented by RFdiffusion have established a new computational paradigm for protein backbone generation by defining a denoising process over the rigid-body transformation group. However, in the de novo design of binders targeting “undruggable” PPI targets, this general paradigm encounters significant adaptability bottlenecks. First, its underlying rigid-body assumption struggles to accurately describe the dynamic induced-fit process of peptides at the binding interface. Second, it lacks sufficient robustness to the experimental resolution heterogeneity inherent in training data. Furthermore, the decoupled two-stage generation of sequence and structure severs the synergy of physicochemical properties, leading to backbones with idealized, singular secondary structures that lack authentic spatial binding capacity and reasonable side-chain physicochemical features. To address these challenges, this study proposes PPI-Diff, a novel generative framework. While preserving the generative capability of diffusion models, PPI-Diff introduces three core mechanisms: (1) a resolution-aware constraint mechanism that maps the measurement precision of experimental data into explicit contextual constraints to dynamically suppress geometric noise from low-resolution samples; (2) an internal-coordinate-driven manifold diffusion model that performs conformational evolution on a Riemannian manifold constructed by dihedral angles, balancing local stereochemical validity with the precise capture of flexible peptide conformations; and (3) a geometry-semantic synergistic modeling mechanism that leverages the evolutionary embeddings of a pre-trained protein language model (ESM-2) as latent variables to align structure generation with biophysical functions. Systematic benchmarking demonstrates that, on a strictly non-homologous test set, the binders generated by PPI-Diff significantly outperform existing baseline models in terms of interface contact density, stereochemical validity, and sequence novelty.
Dong et al. (Wed,) studied this question.