An XGBoost machine learning model predicted CAC categories from EHR data with an AUROC of 0.70 (95% CI 0.67-0.73) and significantly improved CVD risk stratification when added to traditional scores.
Cohort (n=3,705)
Does a machine learning model using EHR data improve cardiovascular risk stratification in patients without prior cardiovascular disease?
Machine learning models using EHR data can predict CAC scores and significantly improve cardiovascular risk stratification beyond traditional risk scores, potentially aiding in statin therapy decisions without imaging.
Effect estimate: AUROC 0.70 (95% CI 0.67-0.73)
Coronary artery calcium (CAC) score is a well-established marker for cardiovascular disease (CVD) risk, yet it remains inaccessible to many patients due to limited availability of imaging. In this study, we developed and validated machine learning (ML) models to predict CAC scores using electronic health record (EHR) data and examined their utility in stratifying future CVD risk. We included 3705 patients without a prior history of CVD who underwent CAC scanning. CAC scores were categorized into four groups: 0, 1–100, 101–400, and >400. The data were split into training and test sets (70/30), and four ML models—decision tree, random forest, eXtreme Gradient Boosting (XGBoost), and deep neural network—were utilized to predict CAC categories. XGBoost achieved the highest performance, with an area under the receiver operating characteristic curve (AUROC) of 0.70 (95% CI 0.67–0.73) followed by random forest (0.69; 95%CI 0.67-0.72), DNN (0.68; 95% CI 0.68-0.71), and decision tree (0.67; 95% CI 0.65-0.70). Over a 5-year follow-up, 717 patients (19%) experienced CVD events. CT-measured CAC and the XGBoost-derived CAC risk stratification were significantly associated with CVD events. By adding the XGBoost-derived CAC score to the Pooled Cohort Equations risk estimation, the C statistics significantly increased from 0.63 to 0.65 (p-value = 0.01) and from 0.62 to 0.65 (p = 0.01) when added to the Framingham Risk Score. The observed Net Reclassification Improvements of 0.1 and 0.2 suggest a clinically meaningful enhancement in patient-level risk classification. Incorporating ML-derived CAC scores offers valuable insights beyond traditional CVD risk assessments and may serve as an imaging-independent tool for personalized cardiovascular prevention. A predicted CAC score of zero may support deferring statin therapy, particularly in resource-limited settings.
Takkavatakarn et al. (Fri,) conducted a cohort in Cardiovascular disease risk (n=3,705). Machine learning-derived CAC score (XGBoost) vs. Traditional CVD risk assessments (Pooled Cohort Equations, Framingham Risk Score) was evaluated on Prediction of CAC categories (AUROC 0.70, 95% CI 0.67-0.73). An XGBoost machine learning model predicted CAC categories from EHR data with an AUROC of 0.70 (95% CI 0.67-0.73) and significantly improved CVD risk stratification when added to traditional scores.