ABSTRACT Cardiovascular disease remains a leading cause of global mortality, necessitating the development of reliable, interpretable, and computationally efficient diagnostic support systems. This study proposes a novel ensemble learning framework for heart disease prediction that integrates Principal Component Analysis (PCA) for dimensionality reduction with a weighted soft voting classifier combining Random Forest, XGBoost, and Logistic Regression. The proposed pipeline incorporates robust preprocessing, including imputation, categorical encoding, standardisation, and class balancing via Synthetic Minority Over‐sampling Technique (SMOTE). Performance was evaluated on the Cleveland Heart Disease dataset using 10‐fold stratified cross‐validation, with comprehensive tuning of hyperparameters. The ensemble achieved an F1‐score of 93.3%, accuracy of 93.3%, and an area under the receiver operating characteristic curve of 94.5%, outperforming several recent state‐of‐the‐art models. Detailed ablation studies and interpretability analyses using SHapley Additive exPlanations (SHAP) and feature importance ranking confirmed the critical role of both PCA and ensemble integration. The methodology demonstrates strong generalisation, robustness to noise and missing data, and alignment with clinical interpretability standards. This framework offers a reproducible and transparent approach for deploying machine learning models in diagnostic cardiology.
Building similarity graph...
Analyzing shared references across papers
Loading...
Narayan Jee
Gesu Thakur
Sumit Kumar
Digital twins and applications.
Indian Institute of Technology Roorkee
Patanjali Research Foundation
Building similarity graph...
Analyzing shared references across papers
Loading...
Jee et al. (Thu,) studied this question.
synapsesocial.com/papers/69cf5cb15a333a821460a45d — DOI: https://doi.org/10.1049/dgt2.70019