Current metrics for binary classification, like the Area Under the Receiver Operating Characteristic curve (AUC-ROC) or Log Loss, provide a global performance score. However, they do not quantify predictive quality separately for event and non-event classes. This limitation is particularly critical in imbalanced settings like medical diagnostics. To address it, we introduce the U-smile Likelihood Evaluation (LE) method, a substantial extension of the original U-smile framework. The U-smile LE method is based on a new metric called the relative Likelihood Ratio (rLR). This single score measures overall model strength without needing a classification threshold. We decompose this score into two class-specific components: \ (\: rLR₁\) for event class and \ (\: rLR₀\) for non-event class, visualizing them simultaneously in a compact U-shaped plot. We validated the U-smile LE method on synthetic datasets with varying class imbalance and a real-world clinical Heart Disease dataset. In severely imbalanced scenarios (90/10 class distribution), stepwise variable selection guided by U-smile LE outperformed traditional AUC-based selection, improving minority-class detection by 16% in the Area Under the Precision-Recall curve (AUC-PR) and 21% in F1-score. The evolution of U-smile patterns during variable selection provided clear, interpretable insight into class-specific contributions of individual predictors. Demonstrated with both logistic regression and random forest models, U-smile LE offers an explainable, model-agnostic framework for evaluating binary classifiers, especially valuable where class imbalance and interpretability are key concerns.
Więckowska et al. (Fri,) studied this question.