Financial fraud and money laundering continue to challenge financial stability and regulatory oversight, motivating the widespread adoption of machine learning models for transaction monitoring. Although ensemble models such as Random Forest and XGBoost achieve strong predictive performance, their deployment in high-stakes financial environments is constrained by limited interpretability, overconfident predictions, and the absence of principled mechanisms for expressing decision uncertainty. Emerging regulatory expectations increasingly emphasise transparency, accountability, and operational reliability, underscoring the need for evaluation frameworks that extend beyond predictive accuracy. This study proposes the Integrated Transparency and Confidence Framework (ITCF), a deployment-oriented approach that unifies model explainability, statistically valid uncertainty quantification, and operational decision support for fraud detection. ITCF combines instance-level explanations generated via Local Interpretable Model-Agnostic Explanations (LIME) with distribution-free uncertainty estimation using split conformal prediction. The framework incorporates selective explainability, abstention-based routing, and uncertainty-driven triage to support human-in-the-loop review. Using the PaySim dataset of 6,362,620 mobile-money transactions, Random Forest and XGBoost models are evaluated under extreme class imbalance using F1-score, AUC–ROC, and Matthews Correlation Coefficient (MCC). At a target coverage level of 90% (α=0.1), both models achieve empirical coverage close to the target level, with XGBoost producing smaller prediction sets and superior recall, MCC, and latency. ITCF provides transaction-level explanations for uncertain cases and specifies an auditable workflow that is intended to support transparency, traceability, and risk-aware human review, thereby enabling defensible human decision-making in regulated environments. Overall, this study illustrates how explainability and uncertainty quantification can be combined in a deployment-oriented evaluation workflow while noting that real-world validation remains a future endeavour.
Mapaila et al. (Fri,) studied this question.