This study instantiates credit strategy optimization at the transaction authorization layer, with actions approve, review, and decline. Within an Offline Conservative RL (CQL) framework, we co-optimize fraud loss, operational burden from manual reviews, and customer friction from false positives and delays via a unified multi-objective cost function. Using a public credit-card transaction dataset with severe class imbalance, the learned policy improves total cost relative to cost-sensitive supervised baselines, while offering favorable trade-offs along a Pareto frontier between risk, operations, and friction. We detail the MDP design (state featurization, action space, and cost weights) and show that CQL mitigates out-of-distribution overestimation in offline settings. The results indicate that conservative RL is a practical path for transaction-level credit decision-making that balances fraud risk with operational efficiency and user impact.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yang Ximeng
Zhang Yiming
Building similarity graph...
Analyzing shared references across papers
Loading...
Ximeng et al. (Wed,) studied this question.
synapsesocial.com/papers/6997fa12ad1d9b11b3452f95 — DOI: https://doi.org/10.70393/6a6574626d.333932