August 12, 2025Open Access

Enhancing end-stage renal disease outcome prediction: a multisourced data-driven approach

Key Points

Integrated data models achieved the highest area under the receiver operating characteristic curve (AUROC) of 0.93, indicating superior prediction.
The use of machine learning and deep learning models allowed for effective feature engineering and enhanced prediction accuracy.
Explaining predictions through SHAP analysis helps identify key predictors and reduce bias, particularly among African American patients.
This framework supports improved clinical decisions and targeted interventions to mitigate health-care disparities in CKD management.

Abstract

To improve prediction of chronic kidney disease (CKD) progression to end-stage renal disease (ESRD) using machine learning (ML) and deep learning (DL) models applied to integrated clinical and claims data with varying observation windows, supported by explainable artificial intelligence (AI) to enhance interpretability and reduce bias. We utilized data from 10 326 CKD patients, combining clinical and claims information from 2009 to 2018. After preprocessing, cohort identification, and feature engineering, we evaluated multiple statistical, ML and DL models using 5 distinct observation windows. Feature importance and SHapley Additive exPlanations (SHAP) analysis were employed to understand key predictors. Models were tested for robustness, clinical relevance, misclassification patterns, and bias. Integrated data models outperformed single data source models, with long short-term memory achieving the highest area under the receiver operating characteristic curve (AUROC) (0.93) and F1 score (0.65). A 24-month observation window optimally balanced early detection and prediction accuracy. The 2021 estimated glomerular filtration rate (eGFR) equation improved prediction accuracy and reduced racial bias, particularly for African American patients. Improved prediction accuracy, interpretability, and bias mitigation strategies have the potential to enhance CKD management, support targeted interventions, and reduce health-care disparities. This study presents a robust framework for predicting ESRD outcomes, improving clinical decision-making through integrated multisourced data and advanced analytics. Future research will expand data integration and extend this framework to other chronic diseases.

Enhancing end-stage renal disease outcome prediction: a multisourced data-driven approach

Key Points

Abstract

Cite This Study

Also Consider

Also Consider