With the global prevalence of diabetic kidney disease (DKD) continuing to rise, its early diagnosis remains challenging due to the limitations of existing clinical methods. This study integrates pathological diagnostic data using machine learning (ML), employing feature selection, multi-model comparison, and model evaluation to construct a high-precision early prediction model. This retrospective study enrolled patients from Zhejiang Provincial Hospital of Traditional Chinese Medicine and Hangzhou Hospital of Traditional Chinese Medicine, including individuals newly diagnosed with diabetes mellitus (DM) and those pathologically confirmed with DKD. A machine learning-compatible dataset was constructed, comprising 209 cases in the training set and 42 cases in the external validation set. Key predictors were screened using univariate logistic regression and Lasso regression. ML algorithms, including XGBoost, random forest, and logistic regression, were employed to construct models. A comparative analysis of binary classification models revealed that the XGBoost model achieved optimal diagnostic performance. The model underwent internal cross-validation and external multicenter validation, with performance evaluated using Receiver Operating Characteristic (ROC) curves, sensitivity and specificity, Decision Curve Analysis (DCA), and calibration curves to assess discriminative power and clinical utility. 1. Significant differences (P < 0.05) in hematological and biochemical parameters were observed between the DKD and DM groups. Independent risk factors for DKD included the Uric Acid to High-Density Lipoprotein Cholesterol Ratio (UHR), Neutrophil to High-Density Lipoprotein Ratio (NHR), Systemic Immune-Inflammatory Index (SII), Homocysteine (HCY), Platelet Distribution Width (PDW), and Albumin (ALB). 2. The XGBoost model, constructed using key variables, outperformed other models in both training and validation sets. Internal validation yielded an AUC of 0.83, while external validation yielded 0.80. Critical predictors (HCY, UHR, NHR) exhibited AUC values of 0.79, 0.78, and 0.78, respectively. The model demonstrated balanced performance with sensitivity (0.63), specificity (0.90), and accuracy (77%), confirming its robust diagnostic capability. The XGBoost-based diagnostic prediction model effectively integrates novel inflammatory and metabolic indicators, enabling high-precision early diagnosis of DKD with significant clinical applicability.
Liu et al. (Wed,) studied this question.