What type of study is this?

This is a Quantitative Study study.

September 18, 2025

Machine Learning Approaches for Credit Card Fraud Detection‎in Severely Imbalanced Datasets: A Comparative Analysisof ‎Classification and Anomaly Detection Methods

Key Points

Ensemble-based models, particularly Gradient Boosting, achieved an AUC-ROC of 0.956, indicating outstanding fraud detection performance.
Feature analysis revealed anonymized PCA-derived variables as the most effective indicators, enhancing detection accuracy.
Threshold optimization minimized operational costs at $2,985 while maintaining full recall, showing a significant net benefit of $68,985.
The study highlights the necessity of cost-sensitive evaluation metrics and business-driven threshold calibration for effective fraud prevention systems.

Abstract

Credit card fraud presents a persistent threat to financial institutions, exacerbated by the rise of digital payments and the complexity of ‎fraudulent schemes. This study investigates machine learning (ML) approaches for fraud detection in severely imbalanced datasets, focusing ‎on three key objectives: comparing classification and anomaly detection models under extreme class imbalance, identifying transaction ‎features with the highest discriminative power, and optimizing decision thresholds using cost-sensitive evaluation to minimize business ‎impact. Utilizing a dataset of 999 transactions with a fraud rate of 0. 2% (498. 5: 1 imbalance), we implemented supervised methods (logistic ‎regression, random forest, gradient boosting) and unsupervised anomaly detection (Isolation Forest, One-Class SVM, Local Outlier ‎Factor). Results show that ensemble-based models, particularly Gradient Boosting, achieved superior performance (AUC-ROC = 0. 956; ‎AUC-PR = 0. 378) with perfect recall and improved precision relative to other methods. Feature analysis identified anonymized PCA-‎derived variables (V14, V10, V12) as the most discriminative indicators of fraudulent activity. Threshold optimization at 0. 9 minimized ‎operational costs (2, 985) while maintaining full recall, yielding an estimated annual net benefit of 68, 985 and a return on investment of ‎‎186. 7%. This study contributes to the literature by integrating algorithm benchmarking, feature importance evaluation, and cost-sensitive ‎threshold optimization in an end-to-end fraud detection framework. The findings underscore the importance of ensemble learning, ‎imbalanced evaluation metrics (AUC-PR, precision, recall), and business-driven threshold calibration for developing effective and ‎economically viable fraud prevention systems. Future research should explore larger datasets, adaptive learning to address concept drift, and ‎explainable AI techniques to enhance interpretability and regulatory compliance‎.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Sezai Tunca

Alanya Hamdullah Emin Pasa University

Yavuz Selim Balcıoğlu

Doğuş University

Ceren Çubukçu Çerasi

Gebze Technical University

Journals

International Journal of Basic and Applied Sciences

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Machine Learning Approaches for Credit Card Fraud Detection‎in Severely Imbalanced Datasets: A Comparative Analysisof ‎Classification and Anomaly Detection Methods

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study