OBJECTIVE: This research focused on establishing and testing a machine learning-driven predictive tool aimed at assisting in the identification of diabetic kidney disease (DKD). METHODS: The prediction models were developed and internally temporally validated using single institution data. A total of 1463 patients from Shaanxi Provincial People's Hospital between March 2023 and September 2024 were incorporated in our study. Least absolute shrinkage and selection operator regression with 10-fold cross-validation was used to select the optimal features. We compared extreme gradient boosting, random forest (RF), support vector machine, and logistic regression across a range of metrics: area under the receiver operating characteristic curve (AUC-ROC), area under the precision-recall curve (AUC-PR), accuracy, precision, recall, kappa values, and F1-score. For each algorithm, a simplified model was developed using only routinely available clinical variables and was trained and evaluated on the same datasets as the full model. Decision curve analysis and calibration curve served to evaluate the clinical utility of the optimal models. Analysis and interpretation of feature importance were performed via SHapley Additive exPlanations and Local Interpretable Model-agnostic Explanations. RESULTS: When screening for DKD in Type 2 diabetes, the full RF model achieved superior performance (AUC-ROC = 0.906, AUC-PR = 0.902, accuracy = 0.830, F1 = 0.847, precision = 0.794, recall = 0.907, and kappa = 0.657) and significantly outperformed the simplified RF model. It also exhibited a favorable clinical net benefit and well-calibrated performance. The most influential predictors identified in the full RF model were urine α1-microglobulin, hypertension, 24-h urinary total protein, duration of Type 2 diabetes mellitus, systolic blood pressure, serum retinol-binding protein, complement C1q, and 25-hydroxyvitamin D. CONCLUSION: A RF prediction model was developed to facilitate early screening of DKD, highlighting the significant roles of specific clinical and laboratory factors in disease prediction.
Qiu et al. (Thu,) studied this question.