What question did this study set out to answer?

This research aims to develop a predictive tool using machine learning to identify diabetic kidney disease in Type 2 diabetes patients.

June 10, 2026Open Access

Identifying Diabetic Kidney Disease in Type 2 Diabetes Patients Using Explainable Machine Learning: A Case‐Control Study

Key Points

This research aims to develop a predictive tool using machine learning to identify diabetic kidney disease in Type 2 diabetes patients.
Developed machine learning predictive models using data from 1463 patients.
Compared algorithms including random forest, support vector machine, and logistic regression.
Employed decision curve analysis and SHapley Additive exPlanations for evaluation.
Full RF model achieved AUC-ROC of 0.906 and accuracy of 0.830.
Significant improvement over the simplified RF model was observed.
Key predictors included urine α1-microglobulin, systolic blood pressure, and duration of Type 2 diabetes.

Abstract

OBJECTIVE: This research focused on establishing and testing a machine learning-driven predictive tool aimed at assisting in the identification of diabetic kidney disease (DKD). METHODS: The prediction models were developed and internally temporally validated using single institution data. A total of 1463 patients from Shaanxi Provincial People's Hospital between March 2023 and September 2024 were incorporated in our study. Least absolute shrinkage and selection operator regression with 10-fold cross-validation was used to select the optimal features. We compared extreme gradient boosting, random forest (RF), support vector machine, and logistic regression across a range of metrics: area under the receiver operating characteristic curve (AUC-ROC), area under the precision-recall curve (AUC-PR), accuracy, precision, recall, kappa values, and F1-score. For each algorithm, a simplified model was developed using only routinely available clinical variables and was trained and evaluated on the same datasets as the full model. Decision curve analysis and calibration curve served to evaluate the clinical utility of the optimal models. Analysis and interpretation of feature importance were performed via SHapley Additive exPlanations and Local Interpretable Model-agnostic Explanations. RESULTS: When screening for DKD in Type 2 diabetes, the full RF model achieved superior performance (AUC-ROC = 0.906, AUC-PR = 0.902, accuracy = 0.830, F1 = 0.847, precision = 0.794, recall = 0.907, and kappa = 0.657) and significantly outperformed the simplified RF model. It also exhibited a favorable clinical net benefit and well-calibrated performance. The most influential predictors identified in the full RF model were urine α1-microglobulin, hypertension, 24-h urinary total protein, duration of Type 2 diabetes mellitus, systolic blood pressure, serum retinol-binding protein, complement C1q, and 25-hydroxyvitamin D. CONCLUSION: A RF prediction model was developed to facilitate early screening of DKD, highlighting the significant roles of specific clinical and laboratory factors in disease prediction.

Read Full Paperexternally

Perguntar à IA

Bookmark

View Full Paper