What question did this study set out to answer?

April 15, 2026Open Access

Benchmarking the performance of uncertainty quantification methods for neural network-based interatomic potentials

Key Points

The aim is to evaluate different uncertainty quantification methods for neural network-based interatomic potentials.
Construct different neural network potentials with varying architectures.
Evaluate model performance based on mean and uncertainty calibration error.
Benchmark methods using multiple datasets common in ML-IAP literature.
Focus on distinctions between epistemic and aleatoric uncertainty.
Aleatoric uncertainty from single-shot models competes well with ensemble-based epistemic predictions in high data-density areas.
In sparse data regions, aleatoric models overpredict while epistemic methods underpredict actual model error.
The effectiveness of UQ methods varies significantly depending on the data characteristics.

Abstract

Abstract Machine-learned interatomic potentials (ML-IAPs) continue to gain popularity as accurate, computationally efficient replacements for traditional, physics-based interatomic potentials and expensive ab initio methods. Uncertainty quantification (UQ) of ML-IAPs is a growing area of research as UQ is critical in many applications of IAPs, such as developing curated datasets, active learning-based data augmentation, self-improving models, and estimating the uncertainty of molecular dynamics simulations. In this paper, we construct and benchmark a series of different neural network potentials (NNPs) with varying network architectures to determine the performance of these models with respect to both the mean and uncertainty calibration error. Each NNP method is specifically designed to predict either epistemic or aleatoric uncertainty with particular focus on the differences in behavior between the epistemic and aleatoric uncertainty estimates. We benchmark these methods using multiple datasets common in the ML-IAP literature. The results show that the aleatoric uncertainty from single-shot model architectures is a competitive alternative to ensemble-based epistemic uncertainty predictions in regions of sufficient data-density. However, in regions where the representative data is sparse, aleatoric uncertainty models tend to overpredict and epistemic methods tend to underpredict the actual model error. We conclude that the type of UQ is crucial when discussing performance of probabilistic model results as different methods have different performance characteristics depending on the regime in which they are evaluated. Therefore, the type of UQ method should be carefully evaluated against both the data characteristics and requirements for the intended application.

Bookmark

View Full Paper

Bookmark

View Full Paper

Benchmarking the performance of uncertainty quantification methods for neural network-based interatomic potentials

Key Points

Abstract

Cite This Study