What question did this study set out to answer?

February 6, 2026Open Access

A Multi-Model Adaptive Q-Learning Framework for Robust Portfolio Management in Stochastic Markets

Key Points

The central aim is to develop and evaluate TAQLA, a Tabular Adaptive Q-Learning Agent for portfolio management in uncertain financial environments.
Implemented a multi-model reinforcement learning architecture
Compared TAQLA against vanilla Q-Learning, SARSA, and random trading policies
Conducted simulations using equity market data
Performed parameter sensitivity analysis for exploration and discounting strategies
TAQLA achieved a portfolio value of $1687.45, a 68.74% increase from initial capital
Obtain a Sharpe ratio of 1.41, indicating strong performance
Limited maximum drawdown to just 12.8%
Vanilla Q-Learning and SARSA showed Sharpe ratios below 1.0 and higher drawdowns

Abstract

This study presents TAQLA, a new Tabular Adaptive Q-Learning Agent for portfolio management in stochastic financial markets. TAQLA rests on a multi-model reinforcement learning (RL) architecture that integrates parameter-adaptive Q-Learning mechanisms into softmax-based exploration to reconcile short-term profit maximization with long-term capital preservation. The method is contrasted with vanilla Q-Learning, SARSA, and a random trading policy using simulated equity market data. Empirical analysis shows that TAQLA performs better on profitability, risk-adjusted performance, and drawdown minimization, with a last portfolio value of 1687. 45 (+68. 74% of initial capital), a Sharpe ratio of 1. 41, and a maximum drawdown of just 12. 8%. Q-Learning and SARSA, on the other hand, yield Sharpe ratios below 1. 0 and drawdowns exceeding 18%. Parameter sensitivity analysis across β (softmax temperature), α (learning rate), and γ (discount factor) reveals that aggressive exploration (β ≈ 1. 0–1. 5) and reasonable discounting (γ ≈ 0. 4–0. 6) generate the most aggressive and robust outcomes. Such outcomes place TAQLA as a robust RL-based adaptive portfolio control method under uncertainty, with improved capital appreciation and robustness to adverse market conditions.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Milon Biswas

Towson University

Md. Borhan Uddin

Masuma Akter Semi

Journals

International Journal of Advanced Computer Science and Applications

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Multi-Model Adaptive Q-Learning Framework for Robust Portfolio Management in Stochastic Markets

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study