Reliable methods to quantify the predictive uncertainty of machine learning (ML) models can significantly increase the impact of molecular property prediction and are routinely used in applications like active learning and ML-guided property optimization. Poor predictive accuracy of ML models is often related to (i) regions of the chemical space, which are characterized by large property differences for structurally similar molecules, and (ii) a lack of representation of test molecules in the training data. Here, we analyze the relationship between these error sources and the predictive uncertainty of popular uncertainty quantification (UQ) methods on molecular activity data sets. We find that several UQ methods struggle to identify poorly predicted compounds in regions of steep structure–activity relationships (SAR). We also demonstrate that the evaluation scenario, as defined by data splitting into training and test sets, significantly impacts observed UQ performance. Based on our findings we introduce a simple but strong and very robust method for UQ that offers significant improvements over previous approaches in several evaluation scenarios and demonstrate its usefulness in an exploratory active learning setting.
Kötter et al. (Wed,) studied this question.