What question did this study set out to answer?

This research aims to develop an interpretable machine learning framework for predicting cardiovascular risk using structured health data.

May 3, 2026Open Access

An explainable machine learning framework for cardiovascular risk prediction using structured health data

Key Points

This research aims to develop an interpretable machine learning framework for predicting cardiovascular risk using structured health data.
Used a publicly available dataset with 70,000 records and various health-related variables.
Implemented three ML models: Logistic Regression, Random Forest Classifier, and Gradient Boosting Classifier with five-fold stratified cross validation.
Utilized SHAP to enhance interpretability by analyzing feature contributions.
The Gradient Boosting model achieved the highest AUC of 0.794, with voting ensemble scoring 0.793.
Overall, ensemble models performed better by capturing nonlinear features than the baseline Logistic Regression model (AUC 0.773).
Key predictors included age, blood pressure, cholesterol, and weight across all models.

Abstract

Background Heart disease (CVD) is still one of the leading causes of death worldwide. As a result of complex clinical data, more common applications of machine learning models for CVD risk prediction. Yet, many machine learning methods suffer from a lack of interpretability which will make it hard for them to be employed in the clinical setting. This study introduces an interpretable ML framework for predicting cardiovascular risk using structured clinical data. Methods This study used a publicly available cardiovascular dataset consisting of about 70,000 patient records. It contains various demographic, physiological, and lifestyle-related variables normally utilized in cardiovascular risk evaluation. For five folds, Stratified Cross Validation was performed to develop three ML models, namely LogisticRegression(), RandomForestClassifier(), and GradientBoosting Classifier(). The model performance at various evaluation metrics, such as accuracy, precision, recall, F1-score and area under the receiver operating characteristic curve (AUC-ROC) were measured. SHAP (Shapley Additive Explanations) was used to explain both global and local feature contributions in an effort to improve interpretability. Results The models evaluated for the experimental results displayed similar prediction performance with ensemble-based methods performing better. Voting Ensemble model was scored second with an AUC of 0.793 (Gradient Boosting had the highest predictive performance: 0.794). The models achieved an AUC well above the baseline Logistic Regression model performing at 0.773. The higher accuracy of ensemble models is mainly due to their ability to capture nonlinear interactions between features in the dataset. Discussion In terms of the most influential predictors across all models, the explainability analysis found that age, blood pressure, cholesterol levels and weight were predominantly* included. By building on the application of explainable artificial intelligence techniques with machine learning models, these results show how such approaches can lead to more transparent and interpretable cardiovascular risk prediction. This framework demonstrates the potential of explainable machine learning to facilitate clinical decision-making and build trust in predictive healthcare models.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Valeru Vision Paul

Vellore Institute of Technology University

Jafar Ali Ibrahim Syed Masood

UCSI University

Journals

Frontiers in Artificial Intelligence

SHILAP Revista de lepidopterología

Actions

Institutions

Vellore Institute of Technology University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

An explainable machine learning framework for cardiovascular risk prediction using structured health data

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study