What question did this study set out to answer?

February 6, 2026Open Access

A Multi-Model Adaptive Q-Learning Framework for Robust Portfolio Management in Stochastic Markets

Key Points

The central aim is to develop and evaluate TAQLA, a Tabular Adaptive Q-Learning Agent for portfolio management in uncertain financial environments.
Implemented a multi-model reinforcement learning architecture
Compared TAQLA against vanilla Q-Learning, SARSA, and random trading policies
Conducted simulations using equity market data
Performed parameter sensitivity analysis for exploration and discounting strategies
TAQLA achieved a portfolio value of $1687.45, a 68.74% increase from initial capital
Obtain a Sharpe ratio of 1.41, indicating strong performance
Limited maximum drawdown to just 12.8%
Vanilla Q-Learning and SARSA showed Sharpe ratios below 1.0 and higher drawdowns

Abstract

This study presents TAQLA, a new Tabular Adaptive Q-Learning Agent for portfolio management in stochastic financial markets. TAQLA rests on a multi-model reinforcement learning (RL) architecture that integrates parameter-adaptive Q-Learning mechanisms into softmax-based exploration to reconcile short-term profit maximization with long-term capital preservation. The method is contrasted with vanilla Q-Learning, SARSA, and a random trading policy using simulated equity market data. Empirical analysis shows that TAQLA performs better on profitability, risk-adjusted performance, and drawdown minimization, with a last portfolio value of 1687. 45 (+68. 74% of initial capital), a Sharpe ratio of 1. 41, and a maximum drawdown of just 12. 8%. Q-Learning and SARSA, on the other hand, yield Sharpe ratios below 1. 0 and drawdowns exceeding 18%. Parameter sensitivity analysis across β (softmax temperature), α (learning rate), and γ (discount factor) reveals that aggressive exploration (β ≈ 1. 0–1. 5) and reasonable discounting (γ ≈ 0. 4–0. 6) generate the most aggressive and robust outcomes. Such outcomes place TAQLA as a robust RL-based adaptive portfolio control method under uncertainty, with improved capital appreciation and robustness to adverse market conditions.

A Multi-Model Adaptive Q-Learning Framework for Robust Portfolio Management in Stochastic Markets

Key Points

Abstract

Cite This Study