Financial markets are characterized by non-stationarity, regime shifts, and complex cross-asset interactions, which challenge traditional portfolio optimization and motivate reinforcement learning (RL) for adaptive decision-making. However, many RL-based approaches remain predominantly return-centric, with risk, uncertainty, and human oversight only weakly integrated, limiting robustness and practical applicability. This review provides a critical synthesis of risk-aware and uncertainty-sensitive reinforcement learning for portfolio optimization from a human–AI collaboration perspective. We analyze major architectural paradigms—including single-agent, hierarchical, multi-agent, and modular systems—together with risk modeling strategies (e.g., reward shaping, constraint-based optimization, and downside risk measures such as CVaR) and probabilistic approaches to uncertainty estimation (e.g., Bayesian neural networks, Monte Carlo dropout, and ensembles). A structured analysis of 57 fully assessed studies reveals that only 5 (9%) explicitly couple uncertainty estimation with risk constraint mechanisms, while 38 (69%) treat risk and uncertainty as structurally independent components. We identify a central structural limitation: risk objectives are rarely conditioned on epistemic uncertainty, while uncertainty estimates seldom influence constraint mechanisms or capital allocation. This decoupling leads to fragmented frameworks that remain difficult to deploy in real financial environments. By integrating architectural design, risk modeling, uncertainty estimation, and evaluation practices, this review proposes a unified, deployment-oriented perspective for developing governance-aligned portfolio decision-support systems.
Khemlichi et al. (Wed,) studied this question.