What question did this study set out to answer?

April 22, 2026Open Access

Mortality Prediction Among People Living With HIV on Antiretroviral Therapy in Public Health Facilities in Gondar City Administration, Northwest Ethiopia: Machine Learning–Based Study

Key Points

This study aims to use machine learning algorithms to predict mortality among people living with HIV receiving antiretroviral therapy based on baseline predictors.
Conducted a retrospective cohort study using electronic medical records of 12,871 HIV patients on ART.
Evaluated seven base classifiers with stratified 10-fold cross-validation.
Applied SHAP analysis to identify key baseline predictors for mortality risk.
Gradient boosting achieved the highest accuracy of 87.0%, outperforming other models.
Key predictors included lack of formal education and low baseline CD4 count, significantly increasing mortality risk.
Urban residence and working status showed protective effects against mortality risk.

Abstract

Abstract Background Predicting mortality among people living with HIV enables clinicians to implement timely, targeted, and preventive interventions at the start of antiretroviral therapy (ART). However, prognostic models must rely strictly on baseline predictors to avoid look-ahead bias and ensure scientific validity. This study evaluates machine-learning (ML) algorithms for baseline mortality prediction using routine electronic medical record data. Objective This study aims to predict mortality among people living with HIV receiving ART using baseline clinical and sociodemographic characteristics through ML models in public health facilities of Gondar City Administration, Northwest Ethiopia. Methods The retrospective cohort study was conducted using electronic medical record data from 12,871 people living with HIV on ART (2005‐2024). Seven base classifiers were evaluated using stratified 10-fold cross-validation. Synthetic minority oversampling technique (SMOTE)–balanced variants were used only for sensitivity analysis. SMOTE oversampling was applied only to training folds; the final evaluation used the original imbalanced test data. Shapley Additive Explanations (SHAP) analysis identified key baseline predictors. Results Gradient boosting on the original data achieved superior performance (accuracy 87.0%, F 1 -score 0.619, area under the receiver operating characteristic curve 0.859), outperforming extreme gradient boosting ( F 1 -score 0.609, area under the receiver operating characteristic curve 0.835) and SMOTE variants. The SHAP analysis identified education level, lack of formal education (+0.84), and a low baseline cluster of differentiation 4 (CD4; a type of immune cell count) count of 140 cells/mm³ (+0.54) as substantially increasing predicted mortality risk. Urban residence (−0.35) and working functional status (−0.12) showed protective effects, whereas age (45 y; −0.02) had minimal influence in the illustrated case. Globally, lower CD4 counts and the absence of formal education were consistently associated with increased mortality risk. Conclusions Ensemble ML models demonstrated moderate-to-strong discrimination for predicting mortality among people living with HIV using strictly baseline routine electronic medical record data. SHAP-based interpretability confirmed that educational attainment and baseline CD4 count were the dominant determinants of predicted mortality risk, underscoring the combined influence of socioeconomic vulnerability and immunological status at ART initiation. These findings support the potential utility of interpretable ML models for early risk stratification and targeted clinical decision-making in resource-limited settings; however, external validation is required before routine clinical implementation.

Mortality Prediction Among People Living With HIV on Antiretroviral Therapy in Public Health Facilities in Gondar City Administration, Northwest Ethiopia: Machine Learning–Based Study

Key Points

Abstract

Cite This Study