Hematopoietic cell transplantation (HCT) for sickle cell disease (SCD) has excellent outcomes but heterogeneous inter-patient outcomes are a barrier to informed decision making. We have created SPRIGHT, a machine learning individualized predictive model trained on 1,641 SCD HCT cases from 1991–2020 and performed extensive internal cross validation (Chandrashekhar et al JMIR AI 2025). The clinical adoption of prediction models requires testing performance reliability and generalizability outside the development setting by external validation across populations. Temporal validation is an external validation to assesses robustness to evolving clinical practices and patient profiles. To create and demonstrate the use of a novel, shareable, browser-based validation framework built using open- source tools, conduct temporal external validation of the SPRIGHT model by evaluating temporal validation and to facilitate future temporal and geographic validations. We implemented the SPRIGHT using Gradio, an open-source Python library and deployed it on Hugging Face, which allows external users to input new data and obtain performance metrics without requiring access to the original model code or dataset. We evaluated discrimination (accuracy, balanced accuracy, recall, precision, and area under the curve (AUC), and calibration (calibration slope, intercept, and calibration curves). We performed temporal external validation on an independent cohort of 286 patients undergoing HCT between 2021–2023 for OS, EFS, GF, aGVHD, and cGVHD). The model showed moderate performance, but there was a reduction in performance across these metrics, suggesting early signs of model degradation. We performed chi2 contingency tests and identified data drifts in key clinical variables from before to after 2020 and observed Cohorts a significant increase in HLA-mismatched related haploidentical donors post-2020 (p<0.001), along with improved EFS outcomes of haploidentical HCT (p<0.01). Since data- and concept- drift can undermine model reliability and patient safety, we retrained the model by incorporating post-2021 data while retaining the original model hyperparameters. The updated models maintained strong discriminatory and calibrative performance, with AUC of 0.74 (OS), 0.76 (EFS), 0.73 (GF), 0.66 (aGVHD), 0.72 (cGVHD), calibration curves and slope/intercept values, in the <0.5 probability range, demonstrating its generalizability and utility across evolving patient populations We report a novel, browser-accessible external validation framework which, by avoiding the logistical barriers of infrastructure and data transfer, supports reproducible and scalable model validation across centers and demonstrates that SPRIGHT maintains acceptable performance even after being retrained with new data.
Krishnamurti et al. (Sun,) studied this question.