Key points are not available for this paper at this time.
We demonstrate that commonly reported metrics may not have sufficient sensitivity to identify improvement of machine learning models and propose the use of a comprehensive list of performance metrics for reporting and comparing clinical risk prediction models.
Huang et al. (Fri,) studied this question.