Abstract Molecular dynamics (MD) simulations are essential for elucidating biomolecular function, yet the computational cost of all-atom models often limits their reach. Machine-learned coarse-grained (MLCG) models offer a solution by simplifying the representation while maintaining near-atomistic accuracy. However, the training of MLCG models currently requires vast amounts of force-labeled sample conformations from reference atomistic MD. Here, we overcome this limitation by unifying the training of MLCG models with the principles of generative diffusion models. We demonstrate that accurate high-dimensional distributions of molecular ensembles can be recovered by integrating traditional force-matching with denoising objectives. This framework enables the construction of physically consistent and stable force fields while reducing atomistic data requirements by up to two orders of magnitude. Validated across diverse protein folds and scales, our work establishes a bridge between molecular dynamics simulation and modern generative learning, substantially lowering the computational cost of constructing accurate MLCG models and broadening their applicability to large biomolecular systems.
Durumeric et al. (Sun,) studied this question.