A random forest machine learning model effectively predicted the 3-year risk of comorbid hypertension in patients with type 2 diabetes, achieving an AUC of 0.890 in internal testing and 0.834 in external validation.
Cohort (n=900)
Yes
Does a Random Forest machine learning model accurately predict the incidence of new-onset hypertension in adults with type 2 diabetes?
A validated random forest model integrating metabolic, behavioral, and socioeconomic factors effectively predicts the 3-year risk of new-onset hypertension in patients with type 2 diabetes.
Effect estimate: AUC 0.890 (95% CI 0.848-0.932)
p-value: p=<0.05
Background: Hypertension is a critical comorbidity in patients with type 2 diabetes mellitus that significantly increases cardiovascular risk. Although several predictive models have been developed using conventional logistic regression or basic machine learning algorithms, these approaches often face significant limitations. Many existing models suffer from a lack of external validation which limits their generalizability, or they operate as black boxes without providing interpretable clinical insights. Furthermore, most prior studies have focused exclusively on biological indicators while overlooking the potential impact of socioeconomic determinants and lifestyle factors on disease progression. Objective: To address these gaps, this study aimed to develop a high-performance Random Forest model for predicting hypertension risk in diabetic patients by integrating multidimensional data, including clinical metrics, lifestyle habits, and socioeconomic status. The study further sought to validate the model's robustness using an independent external cohort and assess its clinical utility through SHAP analysis, providing transparent interpretations of risk factors to guide personalized medical decision-making. Methods: A multicenter retrospective cohort study was conducted using electronic medical records from two tertiary hospitals. Eligible adults with type 2 diabetes and no prior hypertension were included. A total of 900 eligible patients were included, with 420, 180, and 300 participants in the training, testing, and external validation cohorts, respectively. Feature selection combined Boruta and LASSO methods, yielding seven predictors. Seven algorithms were tested, and model performance was assessed through cross-validation, independent testing, and external validation. The random forest model was explained using SHAP analysis. Results: Among 900 participants, the random forest model achieved the best discrimination, with AUCs of 0.89 in internal testing and 0.83 in external validation. Calibration and decision curve analyses confirmed stability and clinical utility. Key predictors included alcohol consumption, triglycerides, diabetes duration, health insurance type, fasting blood glucose, estimated glomerular filtration rate, and exercise frequency. Conclusion: The validated random forest model effectively predicts hypertension in type 2 diabetes patients, integrating metabolic, behavioral, and socioeconomic factors. Its interpretability and robust performance support its potential use for early identification and personalized prevention of hypertension in clinical practice.
Yang et al. (Wed,) conducted a cohort in Type 2 diabetes mellitus (n=900). Random Forest predictive model vs. Other machine learning algorithms was evaluated on Prediction of new-onset hypertension within 3 years (AUC 0.890, 95% CI 0.848-0.932, p=<0.05). A random forest machine learning model effectively predicted the 3-year risk of comorbid hypertension in patients with type 2 diabetes, achieving an AUC of 0.890 in internal testing and 0.834 in external validation.