This study presents TAQLA, a new Tabular Adaptive Q-Learning Agent for portfolio management in stochastic financial markets. TAQLA rests on a multi-model reinforcement learning (RL) architecture that integrates parameter-adaptive Q-Learning mechanisms into softmax-based exploration to reconcile short-term profit maximization with long-term capital preservation. The method is contrasted with vanilla Q-Learning, SARSA, and a random trading policy using simulated equity market data. Empirical analysis shows that TAQLA performs better on profitability, risk-adjusted performance, and drawdown minimization, with a last portfolio value of 1687. 45 (+68. 74% of initial capital), a Sharpe ratio of 1. 41, and a maximum drawdown of just 12. 8%. Q-Learning and SARSA, on the other hand, yield Sharpe ratios below 1. 0 and drawdowns exceeding 18%. Parameter sensitivity analysis across β (softmax temperature), α (learning rate), and γ (discount factor) reveals that aggressive exploration (β ≈ 1. 0–1. 5) and reasonable discounting (γ ≈ 0. 4–0. 6) generate the most aggressive and robust outcomes. Such outcomes place TAQLA as a robust RL-based adaptive portfolio control method under uncertainty, with improved capital appreciation and robustness to adverse market conditions.
Biswas et al. (Thu,) studied this question.