The global cost of credit card fraud continues to rise, driven by the increasingly concentrated and sophisticated attacks. This situation underscores the necessity for more effective detection and prevention methods. In response to the growing need for better fraud detection and prevention, machine learning has witnessed significant advancements in recent years. This paper provides an overview and comparison of various models. On one hand, there are traditional supervised learning models, such as Logistic Regression, Decision Trees, and Support Vector Machines (SVM). On the other hand, ensemble methods like Random Forest, Gradient Boosting, and XGBoost are also covered. Given the highly imbalanced nature of credit card fraud datasets, the study also examines the impact of the Synthetic Minority Over-sampling Technique (SMOTE) on classification performance. While SMOTE has been shown to improve a models performance for weaker classifiers, its benefits for advanced ensemble methods remain less clear. Consequently, this paper will identify which models benefit most from oversampling and assess whether high-performing classifiers can mitigate the effects of imbalance without the need for data augmentation. When comparing the models performances, Random Forest and XGBoost demonstrated superior performance both with and without SMOTE. Without SMOTE, two models, Logistic Regression and SVM, yielded high accuracy but near-zero performance on key classification metrics, highlighting their inability to effectively detect minority class instances.
Liying Li (Tue,) studied this question.