What question did this study set out to answer?

April 25, 2026Open Access

Transcriptomic Profiling Combined with Machine Learning and Mendelian Randomization Identifies Diagnostic Biomarkers and Immune Infiltration Patterns in Diabetic Kidney Disease

Key Points

This research aims to identify reliable diagnostic biomarkers and immune infiltration patterns in diabetic kidney disease (DKD).
Integrated transcriptomic data from the GEO database with machine learning and Mendelian randomization.
Applied four machine learning methods: LASSO, random forest, SVM-RFE, and XGBoost for feature selection.
Conducted immune infiltration analysis using CIBERSORT for DKD samples.
Identified five hub genes: SPP1, CD44, VCAM1, C3, and TIMP1 related to DKD.
Achieved a cross-validated area under the receiver operating characteristic curve (CV-AUC) of 0.938 for the diagnostic model.
Estimated elevated M1 macrophage and activated CD4+ T cell proportions in DKD samples, correlating hub genes with macrophage infiltration.

Abstract

Diabetic kidney disease (DKD) affects approximately 40% of patients with diabetes mellitus and remains a leading cause of end-stage renal disease worldwide. Early diagnosis and identification of therapeutic targets are critical for improving patient outcomes, yet reliable biomarkers are lacking. This study integrated transcriptomic data from the Gene Expression Omnibus (GEO) database (GSE96804, GSE30528, and GSE142025) with machine learning algorithms and Mendelian randomization (MR) to identify diagnostic biomarkers for DKD. Differentially expressed genes (DEGs) were identified and intersected with key modules from weighted gene co-expression network analysis (WGCNA). Four machine learning methods—least absolute shrinkage and selection operator (LASSO), random forest (RF), support vector machine-recursive feature elimination (SVM-RFE), and extreme gradient boosting (XGBoost)—were applied for feature selection. Five hub genes (SPP1, CD44, VCAM1, C3, and TIMP1) were identified at the intersection of these approaches. Two-sample MR analysis using eQTL data from the eQTLGen Consortium and kidney function GWAS from the CKDGen Consortium provided evidence supporting potential causal associations between SPP1, C3, and TIMP1 expression and estimated glomerular filtration rate decline. Immune infiltration analysis via CIBERSORT estimated elevated proportions of M1 macrophages and activated CD4+ memory T cells in DKD samples, with all five hub genes showing correlations with macrophage infiltration. A diagnostic model based on these five genes achieved a cross-validated area under the receiver operating characteristic curve (CV-AUC) of 0.938 in the discovery dataset and AUC values of 0.917 and 0.889 in two independent external validation cohorts. Drug–gene interaction analysis identified 10 candidate compounds targeting the hub genes. These findings provide a computational framework for identifying candidate diagnostic biomarkers and generating hypotheses regarding potential therapeutic targets for DKD; however, all results are derived from in silico analyses and require experimental validation—including qPCR, immunohistochemistry, and prospective clinical cohort studies—before clinical applicability can be established.

Transcriptomic Profiling Combined with Machine Learning and Mendelian Randomization Identifies Diagnostic Biomarkers and Immune Infiltration Patterns in Diabetic Kidney Disease

Key Points

Abstract

Cite This Study