The article is dedicated to the study of machine learning methods for detecting fraudulent transactions in banking systems. The relevance of the work is determined by the increase in the volume of financial operations, the complexity of fraud schemes, and the limited effectiveness of traditional control methods based on fixed rules. Given the pronounced data imbalance, the task of detecting fraudulent activities requires the application of adaptive classification approaches. The aim of the research is to analyze the potential of machine learning methods to enhance the quality of identifying suspicious transactions. The study employed data preprocessing methods, including feature transformation, categorical variable encoding, and correlation analysis, as well as a comparative analysis of random forest and gradient boosting algorithms. The research investigated the impact of the classification threshold on the quality of fraud detection. It was established that the gradient boosting method demonstrates more stable results compared to the random forest method due to better consideration of nonlinear dependencies and sensitivity to the rare class. It was shown that adjusting the classification threshold allows for increased recall in fraud detection while maintaining controlled levels of false positives. The practical significance of the work lies in the potential application of the proposed approach for monitoring banking operations and supporting decision-making in anti-fraud control systems. The study utilized data preprocessing and analysis methods, including feature transformation, categorical variable encoding, and correlation analysis. For the classification task, Random Forest and XGBoost algorithms were employed, followed by an analysis of the classification threshold. The scientific novelty of the work lies in investigating the impact of the classification threshold on the quality of detecting fraudulent transactions under conditions of imbalanced data. It was established that using the gradient boosting algorithm (XGBoost) enhances the model's robustness to the rare class compared to the random forest method (Random Forest) due to more effective consideration of nonlinear dependencies among features. It was shown that changing the classification threshold allows for regulating the balance between recall in fraud detection and the number of false positives, depending on the practical requirements of the anti-fraud system. The practical significance of the research lies in the applicability of the proposed approach for monitoring banking operations, improving the efficiency of identifying suspicious transactions, and supporting decision-making in fraud prevention systems.
Kapitonova et al. (Sun,) studied this question.