Is it possible to obtain a sufficiently accurate quantum mechanical (QM) energy of an arbitrary oligopeptide structure in an implicit solvent within a second? Herein, we explore the possibility of constructing potential energy surfaces of larger peptides from rigorous quantum chemical data acquired for hundreds of thousands of capped mono-, di-, and tripeptides. We demonstrate that modern machine-learning methods, in particular NequIP, when trained only on tripeptides, can already predict QM energies of random decapeptides with a root-mean-square error (RMSE) of <2 kcal mol–1. The models also perform well on other out-of-distribution tasks: conformer ranking of a 31-peptide (identifying the global minimum), geometry optimization (RMSD 0.009 Å), and prediction of side-chain interaction energies on PDB structures (with sub-kcal mol–1 accuracy). We show that the success of the ML approach is critically dependent on two factors: (i) inclusion of off-equilibrium structures from hot MD sampling, which includes systematic sampling of dihedral angles, and (ii) training on energies of solvated structures, instead of gas-phase energies. Solvated systems are both easier to predict by ML and a more relevant model of typical biomolecular interactions. We make all datasets and models available at doi.org/10.5281/zenodo.15356387.
Andris et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: