Financial fraud represents one of the most critical operational risks faced by financial institutions, resulting in significant financial losses and destabilizing markets. While machine learning models are effective at prediction, their evaluation is often based on statistical performance metrics that do not directly translate into financial impact. This research develops an evaluation framework that integrates the costs of early fraud detection with predictive effectiveness and economic criteria for decision-making. Several supervised learning models (XGBoost, neural networks, Random Forest, decision trees, and logistic regression) were trained and tested on an imbalanced dataset of credit card transactions. To assess the potential benefit of these models for financial institutions, the savings rate and expected loss were employed alongside conventional metrics such as F1 score, AUC-PR, AUC-ROC, recall, and accuracy. The results show that economic outcomes are highly sensitive even among models with similar predictive performance. The ensemble methods, in particular, achieved the optimal balance between fraud detection capabilities and loss reduction, while models optimized solely for accuracy resulted in higher operating costs due to false positives or undetected fraud. The results indicate that the choice of fraud detection models should not be based solely on predictive accuracy, but also on cost asymmetry and risk tolerance. The proposed framework provides practical guidance to financial institutions seeking to align operational risk management and regulatory requirements with machine learning implementation, enabling risk-informed decision-making.
Condori et al. (Sun,) studied this question.