Abstract Machine-learned interatomic potentials (ML-IAPs) continue to gain popularity as accurate, computationally efficient replacements for traditional, physics-based interatomic potentials and expensive ab initio methods. Uncertainty quantification (UQ) of ML-IAPs is a growing area of research as UQ is critical in many applications of IAPs, such as developing curated datasets, active learning-based data augmentation, self-improving models, and estimating the uncertainty of molecular dynamics simulations. In this paper, we construct and benchmark a series of different neural network potentials (NNPs) with varying network architectures to determine the performance of these models with respect to both the mean and uncertainty calibration error. Each NNP method is specifically designed to predict either epistemic or aleatoric uncertainty with particular focus on the differences in behavior between the epistemic and aleatoric uncertainty estimates. We benchmark these methods using multiple datasets common in the ML-IAP literature. The results show that the aleatoric uncertainty from single-shot model architectures is a competitive alternative to ensemble-based epistemic uncertainty predictions in regions of sufficient data-density. However, in regions where the representative data is sparse, aleatoric uncertainty models tend to overpredict and epistemic methods tend to underpredict the actual model error. We conclude that the type of UQ is crucial when discussing performance of probabilistic model results as different methods have different performance characteristics depending on the regime in which they are evaluated. Therefore, the type of UQ method should be carefully evaluated against both the data characteristics and requirements for the intended application.
Wimer et al. (Mon,) studied this question.