The ELDER-ICU model, a machine learning tool for predicting in-hospital mortality in critically ill older adults ( ≥ 65 years), was externally validated across 12 international centers in the US, Austria, South Korea, and China, where we assessed three model updating strategies: recalibration, incremental training, and retraining. While maintaining robust performance in US and Austrian cohorts (AUROC 0.804-0.864), significant drops occurred in Asian sites (South Korea: 0.753; China: 0.698). Incremental training enhanced performance in most centers, while retraining significantly improved AUROC by 0.066 and 0.076 in the two Asian sites (South Korea and China, respectively). Isotonic regression and Platt scaling improved calibration performance globally. This study demonstrates the varying robustness of the ELDER-ICU model and the differential effectiveness of model updating strategies across temporal shifts, populations, and clinical practice environments. Rigorous validation and proactive model adaptation are essential before clinical deployment in settings with heterogeneous populations and clinical practice.
Duan et al. (Thu,) studied this question.