What question did this study set out to answer?

The aim is to create and validate a machine learning model that predicts chronic kidney disease progression across various ethnicities.

January 22, 2026Open Access

A Cross-Ethnicity Validated Machine Learning Model for the Progression of Chronic Kidney Disease in Individuals over 50 Years Old

Key Points

The aim is to create and validate a machine learning model that predicts chronic kidney disease progression across various ethnicities.
Utilized data from the China Health and Retirement Longitudinal Study for model training.
Conducted external validation using cohorts from the English Longitudinal Study of Ageing and the Health and Retirement Study.
Employed multiple machine learning algorithms, including XGBoost, for model development.
Incorporated composite indicators like frailty index and triglyceride–glucose index during feature engineering.
Achieved an area under the curve (AUC) of 0.892 in the training dataset.
Maintained an AUC of 0.867 in external validation with ELSA and 0.871 with HRS.
Outperformed the Kidney Failure Risk Equation (AUC 0.745).
Identified frailty index as the most crucial predictor through SHAP analysis.

Abstract

Background/Objectives: Chronic Kidney Disease (CKD) is a global public health burden with a rising prevalence driven by population aging. Existing prediction models, such as the Kidney Failure Risk Equation (KFRE), often lack generalizability across ethnicities and comprehensive systemic indicators. This study aimed to develop and validate a machine learning model for predicting CKD progression by integrating traditional risk factors with novel composite indicators reflecting systemic health. Methods: Data from the China Health and Retirement Longitudinal Study (CHARLS, n = 2500) was used for model training. External validation was performed using independent cohorts from the English Longitudinal Study of Ageing (ELSA, n = 1200) and the Health and Retirement Study (HRS, n = 1500). Multiple machine learning algorithms, including XGBoost, were employed. Feature engineering incorporated composite indicators such as the frailty index (FI), triglyceride–glucose (TyG) index, and aggregate index of systemic inflammation (AISI). Results: The XGBoost model achieved an area under the curve (AUC) of 0.892 in the training set and maintained robust performance in external validation (AUC 0.867 in ELSA, 0.871 in HRS), outperforming the KFRE (AUC 0.745). SHAP analysis identified the FI as the most influential predictor. Decision curve analysis confirmed the model’s clinical utility. Conclusions: This machine learning model demonstrates high accuracy and cross-ethnicity validity, offering a practical tool for early intervention and personalized CKD management. Future work should address limitations such as the retrospective design and expand validation to underrepresented regions.

A Cross-Ethnicity Validated Machine Learning Model for the Progression of Chronic Kidney Disease in Individuals over 50 Years Old

Key Points

Abstract

Cite This Study