Reinforcement learning (RL) has made significant progress about solving continuous control and discrete space issues. Each algorithm contains different properties so that they are applicable to various issues. This paper conducts a comparative analysis of three widely used RL algorithms, Proximal Policy Optimization (PPO), Deep Q-Network (DQN), and Deep Deterministic Policy Gradient (DDPG) to explore and evaluate their performance in the continuous control Pendulum-v1 environment. This work implements each algorithm using standardized hyperparameters and analyzes its overall performance, convergence speed, and training stability using the same experimental setup. The results show that PPO performs better than DDPG and DQN in terms of stability, while DDPG exhibits the fastest convergence speed among the three. DQN performs poorly in continuous control due to its dependence on Q-maximization and discrete action enumeration, causing the large fluctuations during the convergence process. This work emphasizes the significance of environment-algorithm compatibility and offers experimental support for algorithm selection in continuous control applications
Hongchang Cui (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: