What is the clinical evidence from this study?

Study design: Cohort. Population: Cardiovascular disease risk (n=3705). Intervention: Machine learning-derived CAC score (XGBoost) vs. Traditional CVD risk assessments (Pooled Cohort Equations, Framingham Risk Score). Primary outcome: Prediction of CAC categories (AUROC 0.70, 95% CI 0.67-0.73).

What does this research mean for the field?

An XGBoost machine learning model can predict coronary artery calcium (CAC) categories from electronic health record data and significantly improves cardiovascular disease risk stratification when added to traditional risk scores. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

May 1, 2026Open Access

Cardiovascular risk stratification using predicted coronary artery calcium score from a machine learning model using EHR data

Key Result

An XGBoost machine learning model predicted CAC categories from EHR data with an AUROC of 0.70 (95% CI 0.67-0.73) and significantly improved CVD risk stratification when added to traditional scores.

Study Design

Type

Cohort (n=3,705)

Structured PICO

Does a machine learning model using EHR data improve cardiovascular risk stratification in patients without prior cardiovascular disease?

Population

3,705 patients without a prior history of CVD who underwent CAC scanning, followed for 5 years.

Exposure

Machine learning models (decision tree, random forest, XGBoost, deep neural network) using electronic health record (EHR) data to predict CAC scores.

Comparator

Traditional cardiovascular disease risk assessments (Pooled Cohort Equations and Framingham Risk Score).

Outcome

Prediction of CAC categories (0, 1–100, 101–400, and >400) and prediction of cardiovascular disease events over a 5-year follow-up.surrogate

Machine learning models using EHR data can predict CAC scores and significantly improve cardiovascular risk stratification beyond traditional risk scores, potentially aiding in statin therapy decisions without imaging.

Main Result

Effect estimate: AUROC 0.70 (95% CI 0.67-0.73)

Abstract

Coronary artery calcium (CAC) score is a well-established marker for cardiovascular disease (CVD) risk, yet it remains inaccessible to many patients due to limited availability of imaging. In this study, we developed and validated machine learning (ML) models to predict CAC scores using electronic health record (EHR) data and examined their utility in stratifying future CVD risk. We included 3705 patients without a prior history of CVD who underwent CAC scanning. CAC scores were categorized into four groups: 0, 1–100, 101–400, and >400. The data were split into training and test sets (70/30), and four ML models—decision tree, random forest, eXtreme Gradient Boosting (XGBoost), and deep neural network—were utilized to predict CAC categories. XGBoost achieved the highest performance, with an area under the receiver operating characteristic curve (AUROC) of 0.70 (95% CI 0.67–0.73) followed by random forest (0.69; 95%CI 0.67-0.72), DNN (0.68; 95% CI 0.68-0.71), and decision tree (0.67; 95% CI 0.65-0.70). Over a 5-year follow-up, 717 patients (19%) experienced CVD events. CT-measured CAC and the XGBoost-derived CAC risk stratification were significantly associated with CVD events. By adding the XGBoost-derived CAC score to the Pooled Cohort Equations risk estimation, the C statistics significantly increased from 0.63 to 0.65 (p-value = 0.01) and from 0.62 to 0.65 (p = 0.01) when added to the Framingham Risk Score. The observed Net Reclassification Improvements of 0.1 and 0.2 suggest a clinically meaningful enhancement in patient-level risk classification. Incorporating ML-derived CAC scores offers valuable insights beyond traditional CVD risk assessments and may serve as an imaging-independent tool for personalized cardiovascular prevention. A predicted CAC score of zero may support deferring statin therapy, particularly in resource-limited settings.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper