What question did this study set out to answer?

To assess the clinical applicability of machine learning survival prediction models in eye cancer, focusing on their learning dynamics.

May 16, 2026Open Access

Learning dynamics vs. aggregate metrics: clinical applicability of machine learning survival prediction in eye cancer

Key Points

To assess the clinical applicability of machine learning survival prediction models in eye cancer, focusing on their learning dynamics.
Compared two ensemble learning approaches: CatBoost and RUSBoost for five-year survival prediction in eye cancer.
Evaluated models on classification performance and learning dynamics during training and validation phases.
Analyzed area under the receiver operating characteristic curve and confusion matrices.
Both models had similar discriminative ability with AUC of approximately 0.78.
RUSBoost showed healthy learning dynamics with parallel training and validation curves, while CatBoost indicated overfitting.
RUSBoost is recommended for trustworthiness in clinical application, given its superior learning behavior.

Abstract

Introduction: The application of machine learning in healthcare requires models that demonstrate not only acceptable classification performance but also trustworthy learning behavior suitable for clinical deployment. Class imbalance represents a pervasive challenge in medical datasets, where patients with favorable outcomes substantially outnumber those with adverse events. Materials and methods: This study compared two ensemble learning approaches for five-year survival prediction in eye cancer: CatBoost, a gradient boosting algorithm employing balanced class weights, and RUSBoost, an algorithm integrating random undersampling directly within the boosting framework. Model evaluation extended beyond aggregate performance metrics to include systematic assessment of learning dynamics throughout training. Results: Both classifiers achieved comparable discriminative ability on held-out test data, with area under the receiver operating characteristic curve values of approximately 0.78. Confusion matrix analysis revealed that both models demonstrated acceptable classification rates with expected gradual decreases from training through validation to test partitions. However, examination of learning curves revealed a critical distinction: the RUSBoost classifier exhibited healthy learning dynamics characterized by parallel training and validation curves with a stable and narrow gap, whereas the CatBoost classifier displayed progressively widening divergence between training and validation performance indicative of overfitting that necessitated early stopping intervention. A practitioner examining only confusion matrices and aggregate metrics might reasonably but incorrectly favor CatBoost based on its marginal advantage in classification consistency. Conclusions: These findings demonstrate that model selection in medical artificial intelligence must prioritize transparency in learning dynamics over aggregate performance metrics alone, as models achieving favorable summary statistics through problematic learning pathways cannot be considered trustworthy for clinical application where patient outcomes depend on prediction reliability. This study establishes evaluation criteria to ensure that, when machine learning-based decision support is considered appropriate for a given clinical context, the selected model exhibits learning behavior consistent with genuine predictive capability.

Bookmark

View Full Paper

Bookmark

View Full Paper

Learning dynamics vs. aggregate metrics: clinical applicability of machine learning survival prediction in eye cancer

Key Points

Abstract

Cite This Study