Key points are not available for this paper at this time.
Machine learning models serve as a potent instrument for forecasting heart diseases, nevertheless, class imbalance in datasets—characterized by a disproportionate number of healthy individuals compared to those with heart disease—can markedly affect the efficacy of these models. This study presents a machine learning pipeline that incorporates resampling methods, including SMOTE, ADASYN, and Random Oversampling (ROS), with commonly utilized classifiers, such as Random Forest (RF), k-Nearest Neighbors (kNN), Gradient Boosting, and Adaboost. Utilizing the 2022 CDC's Indicators of Heart Disease dataset, we examine the efficacy of these methodologies considering prediction accuracy, precision, recall, F1-score, and AUC. Compared to various previous studies, the findings show that RF with ROS achieves the highest overall performance, showing 95.75% accuracy, 99.84% recall, 95.91% F1-score, and 99.59% AUC. The findings illustrate the efficacy of oversampling approaches to rectify class imbalance and enhance heart disease prediction.
Rahardi et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: