Financial institutions are under growing regulatory pressure to detect and report money laundering in a way that is accurate, auditable, and fair. This study introduces a reproducible machine learning pipeline for Anti-Money Laundering (AML) detection that integrates statistically validated synthetic data generation, class-imbalance handling, and post-hoc explainability. Using a 10,000-record synthetic AML dataset generated with the Synthetic Data Vaultand Faker, we train Random Forest and Multilayer Perceptron classifiers with class weighting and F2-optimized threshold tuning. Our primary objective is to maximize recall of suspicious transactions while maintaining operationally manageable alert volumes, reflecting regulatory expectations that missed suspicious activity should be minimized. Model performance is evaluated using PR-AUC, precision/recall for the suspicious class, F1 score, MCC, balanced accuracy, and probability calibration. Global and local model interpretability are achieved using TreeSHAP and KernelSHAP, enabling analysts to understand feature contributions and diagnose false positives and false negatives. Fairness audits across age and regional proxies reveal Equal Opportunity gaps, which are mitigated via post-processing threshold adjustments. Results show substantially improved AML recall at regulatory-compliant operating points, with improved precision relative to simple baselines, and provide transparent, auditable outputs aligned with the Bank Secrecy Act and FATF guidance. This work offers U.S. financial institutions a deployable framework that enhances compliance efficiency, supports supervisory review, and enables replication and industry benchmarking.
Pristly Turjo Mazumder (Sun,) studied this question.