Binary classification is one of the most common supervised machine-learning problems. Several metrics have been defined in the literature to assess the performance of binary classification machine-learning models. However, using different metrics to compare two or more models may yield different results, often prompting comparative studies on the best metric for performance analysis. The current paper addresses this topic by developing a theoretical framework, which is validated through examples of real-world binary classification problems. As a first step, the paper defines the concept of equivalent metrics and identifies all pairs of State-of-the-Art metrics that yield the same conclusion when two classifiers are compared. The paper then identifies a specific classification threshold, called the “Point-of-Balanced Performance” (PoBP), for which the entire set of State-of-the-Art performance metrics yields consistent results when comparing classifiers. The paper also identifies the geometrical representation of the PoBP in the Receiver Operating Characteristic curve. Although identifying the PoBP during the training phase is trivial, this is not the case for inference. The paper defines and compares various approximation methods for identifying the PoBP during inference. The results of the analysis are then applied to real-world examples, indicating that the PoBP can become the preferred approach without excluding the option of selecting a State-of-the-Art approach depending on the specific problem characteristics. Overall, the paper provides useful theoretical insights and new tools for approaching binary classification analysis.
Markoulidakis et al. (Wed,) studied this question.