What is the clinical evidence from this study?

Study design: Observational. Population: Atrial fibrillation and ischemic stroke (n=454118). Intervention: XGBoost machine learning model vs. CHA2DS2-VASc score. Primary outcome: Prediction of ischemic stroke in AF patients (AUROC) (AUROC 0.631, 95% CI 0.604-0.657, p=2.20E-06).

October 30, 2022Open Access

Prediction of atrial fibrillation and stroke using machine learning models in UK Biobank

Key Result

The XGBoost machine learning model significantly improved the prediction of ischemic stroke in patients with atrial fibrillation compared to the clinical CHA2DS2-VASc score (AUROC 0.631 vs 0.611).

Study Design

Type

Observational (n=454,118)

Multicenter

Yes

Structured PICO

Do machine learning models improve the prediction of atrial fibrillation and ischemic stroke in patients with AF compared to traditional clinical tools?

Population

454,118 participants aged 37-73 years from the UK Biobank, from which matched cohorts were derived to develop and validate machine learning models for predicting atrial fibrillation and ischemic stroke.

Exposure

Machine learning models (LightGBM, XGBoost, Random Forest, Deep Neural Network, Support Vector Machine) incorporating clinical features, disease phenotypes, and genetic risk scores.

Comparator

Penalised logistic regression (for AF prediction) and CHA2DS2-VASc score (for ischemic stroke prediction).

Outcome

Area under the receiver operating characteristic curve (AUROC) for the prediction of Atrial Fibrillation (AF) and ischemic stroke after AF diagnosis.hard clinical

Machine learning models, particularly LightGBM and XGBoost, demonstrate improved predictive performance for atrial fibrillation and ischemic stroke in AF patients compared to traditional clinical tools like CHA2DS2-VASc, highlighting the value of incorporating genetic scores and peripheral blood biomarkers.

Main Result

Effect estimate: AUROC 0.631 (95% CI 0.604-0.657)

Absolute Event Rate: 0.631% vs 0.611%

p-value: p=2.20E-06

Limitations

Models were developed and assessed only in the UK Biobank and might not reflect other datasets with respect to age, sex, and socio-economic status.
Requires further validation across all ancestries as some features vary by ethnicity.

Abstract

Abstract We employed machine learning (ML) approaches to evaluate 2,199 clinical features and disease phenotypes available in the UK Biobank as predictors for Atrial Fibrillation (AF) risk. After quality control, 99 features were selected for analysis in 21,279 prospective AF cases and equal number of controls. Different ML methods were employed, including LightGBM, XGBoost, Random Forest (RF), Deep Neural Network (DNN),) and Logistic Regression with L1 penalty (LR). In order to eliminate the black box character of the tree-based ML models, we employed Shapley-values (SHAP), which are used to estimate the contribution of each feature to AF prediction. The area-under-the-roc-curve (AUROC) values and the 95% confidence intervals (CI) per model were: 0.729 (0.719, 0.738) for LightGBM, 0.728 (0.718, 0.737) for XGBoost, 0.716 (0.706,0.725) for DNN, 0.715 (0.706, 0.725) for RF and 0.622 (0.612, 0.633) for LR. Considering the running time, memory and stability of each algorithm, LightGBM was the best performing among those examined. DeLongs test showed that there is statistically significant difference in the AUROCs between penalised LR and the other ML models. Among the top important features identified for LightGBM, using SHAP analysis, are the genetic risk score (GRS) of AF and age at recruitment. As expected, the AF GRS had a positive impact on the model output, i.e. a higher AF GRS increased AF risk. Similarly, age at recruitment also had a positive impact increasing AF risk. Secondary analysis was performed for the individuals who developed ischemic stroke after AF diagnosis, employing 129 features in 3,150 prospective cases of people who developed ischemic stroke after AF, and equal number of controls in UK Biobank. The AUC values and the 95% CI per model were: 0.631 (0.604, 0.657) for XGBoost, 0.620 (0.593, 0.647) for LightGBM, 0.599 (0.573, 0.625) for RF, 0.599 (0.572, 0.624) for SVM, 0.589 (0.562, 0.615) for DNN and 0.563 (0.536, 0.591) for penalised LR. DeLongs test showed that there is no evidence for significant difference in the AUROCs between XGBoost and all other examined ML models but the penalised LR model (pvalue=2.00 E-02). Using SHAP analysis for XGBoost, among the top important features are age at recruitment and glycated haemoglobin. DeLongs test showed that there is evidence for statistically significant difference between XGBoost and the current clinical tool for ischemic stroke prediction in AF patients, CHA2DS2-VASc (pvalue=2.20E-06), which has AUROC and 95% CI of 0.611 (0.585, 0.638).

Mark Helpful

Bookmark

Relay

View Full Paper