Abstract Background Predicting mortality among people living with HIV enables clinicians to implement timely, targeted, and preventive interventions at the start of antiretroviral therapy (ART). However, prognostic models must rely strictly on baseline predictors to avoid look-ahead bias and ensure scientific validity. This study evaluates machine-learning (ML) algorithms for baseline mortality prediction using routine electronic medical record data. Objective This study aims to predict mortality among people living with HIV receiving ART using baseline clinical and sociodemographic characteristics through ML models in public health facilities of Gondar City Administration, Northwest Ethiopia. Methods The retrospective cohort study was conducted using electronic medical record data from 12,871 people living with HIV on ART (2005‐2024). Seven base classifiers were evaluated using stratified 10-fold cross-validation. Synthetic minority oversampling technique (SMOTE)–balanced variants were used only for sensitivity analysis. SMOTE oversampling was applied only to training folds; the final evaluation used the original imbalanced test data. Shapley Additive Explanations (SHAP) analysis identified key baseline predictors. Results Gradient boosting on the original data achieved superior performance (accuracy 87.0%, F 1 -score 0.619, area under the receiver operating characteristic curve 0.859), outperforming extreme gradient boosting ( F 1 -score 0.609, area under the receiver operating characteristic curve 0.835) and SMOTE variants. The SHAP analysis identified education level, lack of formal education (+0.84), and a low baseline cluster of differentiation 4 (CD4; a type of immune cell count) count of 140 cells/mm³ (+0.54) as substantially increasing predicted mortality risk. Urban residence (−0.35) and working functional status (−0.12) showed protective effects, whereas age (45 y; −0.02) had minimal influence in the illustrated case. Globally, lower CD4 counts and the absence of formal education were consistently associated with increased mortality risk. Conclusions Ensemble ML models demonstrated moderate-to-strong discrimination for predicting mortality among people living with HIV using strictly baseline routine electronic medical record data. SHAP-based interpretability confirmed that educational attainment and baseline CD4 count were the dominant determinants of predicted mortality risk, underscoring the combined influence of socioeconomic vulnerability and immunological status at ART initiation. These findings support the potential utility of interpretable ML models for early risk stratification and targeted clinical decision-making in resource-limited settings; however, external validation is required before routine clinical implementation.
Gedefaw et al. (Mon,) studied this question.