August 14, 2025Open Access

Exploration and analysis of risk factors for coronary artery disease with type 2 diabetes based on SHAP explainable machine learning algorithm

Key Points

The random forest model exhibited superior performance in predicting coronary artery disease with type 2 diabetes.
Top contributors identified were Diabetes.History, blood glucose, and HbA1c as primary risk factors for CHD-DM2.
Data analysis conducted on 12,400 patients used machine learning techniques to enhance predictive accuracy.
This model aids in clinical decision-making by highlighting critical risk factors for targeted interventions.

Abstract

T2DM is a major risk factor for CHD. In recent years, machine learning algorithms have demonstrated significant advantages in improving predictive accuracy; however, studies applying these methods for clinical prediction and diagnosis of CHD-DM2 remain limited. This study aims to evaluate the performance of machine learning models and to develop an interpretable model to identify critical risk factors of CHD-DM2, thereby supporting clinical decision-making. Data were collected from cardiovascular inpatients admitted to the First Affiliated Hospital of Xinjiang Medical University between 2001 and 2018. A total of 12, 400 patients were included, comprising 10, 257 cases of CHD and 2143 cases of CHD-DM2. To address the class imbalance in the dataset, the SMOTENC algorithm was applied in conjunction with the themis package for data preprocessing. Final predictors were identified through a combined approach of univariate analysis and Lasso regression. We then developed and validated seven machine learning models: Logistic, LogisticLasso, KNN, SVM, XGBoost, RF, and LightGBM. The predictive performance of the five models was compared using evaluation metrics including accuracy, sensitivity, specificity, AUC, ROC and DCA. Additionally, SHAP values were employed to provide interpretability of the model outputs. The dataset was split into a training set (n = 8460) and a validation set (n = 3680) at a 7: 3 ratio. A total of 25 predictive variables were ultimately identified through Lasso regression analysis. Among the seven machine learning models, the RF model demonstrated significantly superior performance and achieved the highest net benefit in the DCA. According to SHAP analysis, Diabetes. History, BG, and HbA1c were identified as the top contributors to CHD-DM2 risk. This study identified Diabetes. History, blood glucose (BG), and HbA1c as the primary risk factors for CHD-DM2. It is recommended that hospitals enhance monitoring of such patients, document the presence of high-risk factors, and implement targeted intervention strategies accordingly.

Exploration and analysis of risk factors for coronary artery disease with type 2 diabetes based on SHAP explainable machine learning algorithm

Key Points

Abstract

Cite This Study