June 4, 2025Open Access

Optimizing Machine Learning Models for Class Imbalance in Heart Disease Prediction

Key Points

Key points are not available for this paper at this time.

Abstract

Machine learning models serve as a potent instrument for forecasting heart diseases, nevertheless, class imbalance in datasets—characterized by a disproportionate number of healthy individuals compared to those with heart disease—can markedly affect the efficacy of these models. This study presents a machine learning pipeline that incorporates resampling methods, including SMOTE, ADASYN, and Random Oversampling (ROS), with commonly utilized classifiers, such as Random Forest (RF), k-Nearest Neighbors (kNN), Gradient Boosting, and Adaboost. Utilizing the 2022 CDC's Indicators of Heart Disease dataset, we examine the efficacy of these methodologies considering prediction accuracy, precision, recall, F1-score, and AUC. Compared to various previous studies, the findings show that RF with ROS achieves the highest overall performance, showing 95.75% accuracy, 99.84% recall, 95.91% F1-score, and 99.59% AUC. The findings illustrate the efficacy of oversampling approaches to rectify class imbalance and enhance heart disease prediction.

Optimizing Machine Learning Models for Class Imbalance in Heart Disease Prediction

Key Points

Abstract

Cite This Study

Also Consider

Also Consider