Diabetic kidney disease (DKD) affects approximately 40% of patients with diabetes mellitus and remains a leading cause of end-stage renal disease worldwide. Early diagnosis and identification of therapeutic targets are critical for improving patient outcomes, yet reliable biomarkers are lacking. This study integrated transcriptomic data from the Gene Expression Omnibus (GEO) database (GSE96804, GSE30528, and GSE142025) with machine learning algorithms and Mendelian randomization (MR) to identify diagnostic biomarkers for DKD. Differentially expressed genes (DEGs) were identified and intersected with key modules from weighted gene co-expression network analysis (WGCNA). Four machine learning methods—least absolute shrinkage and selection operator (LASSO), random forest (RF), support vector machine-recursive feature elimination (SVM-RFE), and extreme gradient boosting (XGBoost)—were applied for feature selection. Five hub genes (SPP1, CD44, VCAM1, C3, and TIMP1) were identified at the intersection of these approaches. Two-sample MR analysis using eQTL data from the eQTLGen Consortium and kidney function GWAS from the CKDGen Consortium provided evidence supporting potential causal associations between SPP1, C3, and TIMP1 expression and estimated glomerular filtration rate decline. Immune infiltration analysis via CIBERSORT estimated elevated proportions of M1 macrophages and activated CD4+ memory T cells in DKD samples, with all five hub genes showing correlations with macrophage infiltration. A diagnostic model based on these five genes achieved a cross-validated area under the receiver operating characteristic curve (CV-AUC) of 0.938 in the discovery dataset and AUC values of 0.917 and 0.889 in two independent external validation cohorts. Drug–gene interaction analysis identified 10 candidate compounds targeting the hub genes. These findings provide a computational framework for identifying candidate diagnostic biomarkers and generating hypotheses regarding potential therapeutic targets for DKD; however, all results are derived from in silico analyses and require experimental validation—including qPCR, immunohistochemistry, and prospective clinical cohort studies—before clinical applicability can be established.
Liu et al. (Thu,) studied this question.