Portfolio optimization is one of the most difficult sequential decision problems, as uncertainty and the non-stationary nature of financial markets hinder the development of robust strategies. Reinforcement learning is an attractive framework for addressing this problem, as it allows agents to learn market-adaptive strategies through data-driven interactions. However, existing risk-neutral reinforcement learning solutions for portfolio management are oblivious to downside risk and are mainly concerned with maximizing returns. To address this limitation, this paper proposes a novel risk-sensitive reinforcement learning framework for risk-aware portfolio optimization based on a conditional value-at-risk-based learning objective that explicitly controls extreme loss events. It formulates the portfolio optimization problem as a Markov decision process and solves it using a linearized actor–critic architecture. It also develops theoretical results to analyze important aspects of the learning process, specifically proving that the convexity of the conditional value-at-risk-based formulation and convergence of learning hold under standard assumptions. The proposed algorithm is applied in a realistic investment setting using NIFTY 50 market data. Quantitative results from a rolling window backtesting methodology show that the proposed model achieves the best risk-adjusted portfolio performance, i.e., a Sharpe ratio (0.610), while significantly reducing tail risk, as measured by the conditional value-at-risk (−0.121) and maximum drawdown (−0.198), compared to classical strategies and risk-neutral reinforcement learning solutions. Overall, the results demonstrate that integrating coherent risk measures into reinforcement learning provides an effective approach for developing robust and risk-aware portfolio optimization strategies in dynamic financial environments.
Mishra et al. (Thu,) studied this question.