What question did this study set out to answer?

The aim is to develop a risk-sensitive reinforcement learning framework for portfolio optimization that considers downside risk.

April 18, 2026Open Access

Risk-Sensitive Reinforcement Learning for Portfolio Optimization Under Stochastic Market Dynamics

Key Points

The aim is to develop a risk-sensitive reinforcement learning framework for portfolio optimization that considers downside risk.
Formulated portfolio optimization as a Markov decision process.
Utilized a linearized actor-critic architecture for solving the process.
Developed theoretical results related to convexity and convergence of the learning process.
Applied the algorithm using NIFTY 50 market data for practical evaluation.
Achieved a Sharpe ratio of 0.610 for risk-adjusted portfolio performance.
Reduced tail risk with a conditional value-at-risk of -0.121.
Demonstrated a maximum drawdown of -0.198, significantly better than classical strategies.
Showed improved performance compared to risk-neutral reinforcement learning solutions.

Abstract

Portfolio optimization is one of the most difficult sequential decision problems, as uncertainty and the non-stationary nature of financial markets hinder the development of robust strategies. Reinforcement learning is an attractive framework for addressing this problem, as it allows agents to learn market-adaptive strategies through data-driven interactions. However, existing risk-neutral reinforcement learning solutions for portfolio management are oblivious to downside risk and are mainly concerned with maximizing returns. To address this limitation, this paper proposes a novel risk-sensitive reinforcement learning framework for risk-aware portfolio optimization based on a conditional value-at-risk-based learning objective that explicitly controls extreme loss events. It formulates the portfolio optimization problem as a Markov decision process and solves it using a linearized actor–critic architecture. It also develops theoretical results to analyze important aspects of the learning process, specifically proving that the convexity of the conditional value-at-risk-based formulation and convergence of learning hold under standard assumptions. The proposed algorithm is applied in a realistic investment setting using NIFTY 50 market data. Quantitative results from a rolling window backtesting methodology show that the proposed model achieves the best risk-adjusted portfolio performance, i.e., a Sharpe ratio (0.610), while significantly reducing tail risk, as measured by the conditional value-at-risk (−0.121) and maximum drawdown (−0.198), compared to classical strategies and risk-neutral reinforcement learning solutions. Overall, the results demonstrate that integrating coherent risk measures into reinforcement learning provides an effective approach for developing robust and risk-aware portfolio optimization strategies in dynamic financial environments.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper