Reinforcement Learning (RL) has become one of the most powerful methods in Artificial Intelligence (AI) which allows a system to learn as it is engaged in an interaction with its environment and to maximize its actions by performing a trial and error process. Although classical RL algorithms can give us a fundamental idea on RL, like the Q-Learning, its usefulness wanes in high-dimensional or continuous-state spaces. To overcome these shortcomings, Deep Reinforcement Learning (DRL) algorithms including Deep Q-Networks (DQN) build on neural networks to learn action-value functions, allowing horizontal scalability and more efficient learning. This paper reports on the experimental comparison of Q-Learning and DQN based on the environment CartPole-v1. Both of the algorithms were trained in the controlled environment and tested under various metrics that include cumulative rewards, success rate, convergence speed, safety (failure rate), and path efficiency. Findings indicate that Q-Learning moves towards moderate stability with slower convergence but DQN takes shorter learning time, is more reliable and performs better. Such results indicate the ground-breaking potential of the DRL in the progress of AI toward transcending classical RL restrictions. Future research topics encompass implementation of more challenging experimental settings, a test of other DRL algorithms (e.g., PPO, SAC), and generalization, scalability and safety in real-world environments.
Gul et al. (Sun,) studied this question.