This study presents TAQLA, a new Tabular Adaptive Q-Learning Agent for portfolio management in stochastic financial markets. TAQLA rests on a multi-model reinforcement learning (RL) architecture that integrates parameter-adaptive Q-Learning mechanisms into softmax-based exploration to reconcile short-term profit maximization with long-term capital preservation. The method is contrasted with vanilla Q-Learning, SARSA, and a random trading policy using simulated equity market data. Empirical analysis shows that TAQLA performs better on profitability, risk-adjusted performance, and drawdown minimization, with a last portfolio value of 1687. 45 (+68. 74% of initial capital), a Sharpe ratio of 1. 41, and a maximum drawdown of just 12. 8%. Q-Learning and SARSA, on the other hand, yield Sharpe ratios below 1. 0 and drawdowns exceeding 18%. Parameter sensitivity analysis across β (softmax temperature), α (learning rate), and γ (discount factor) reveals that aggressive exploration (β ≈ 1. 0–1. 5) and reasonable discounting (γ ≈ 0. 4–0. 6) generate the most aggressive and robust outcomes. Such outcomes place TAQLA as a robust RL-based adaptive portfolio control method under uncertainty, with improved capital appreciation and robustness to adverse market conditions.
Building similarity graph...
Analyzing shared references across papers
Loading...
Milon Biswas
Towson University
Md. Borhan Uddin
Masuma Akter Semi
International Journal of Advanced Computer Science and Applications
Building similarity graph...
Analyzing shared references across papers
Loading...
Biswas et al. (Thu,) studied this question.
synapsesocial.com/papers/698585cb8f7c464f230097e8 — DOI: https://doi.org/10.14569/ijacsa.2026.0170101