What type of study is this?

September 10, 2025Open Access

Evaluating the Performance Metrics of Ppo, Dqn, and Ddpg in Continuous Control Tasks

Key Points

PPO outperforms DDPG and DQN in training stability for continuous control tasks.
DDPG shows the fastest convergence speed among the studied algorithms.
DQN struggles with continuous control due to dependence on Q-maximization.
Experimental results support effective algorithm selection in continuous control applications.

Abstract

Reinforcement learning (RL) has made significant progress about solving continuous control and discrete space issues. Each algorithm contains different properties so that they are applicable to various issues. This paper conducts a comparative analysis of three widely used RL algorithms, Proximal Policy Optimization (PPO), Deep Q-Network (DQN), and Deep Deterministic Policy Gradient (DDPG) to explore and evaluate their performance in the continuous control Pendulum-v1 environment. This work implements each algorithm using standardized hyperparameters and analyzes its overall performance, convergence speed, and training stability using the same experimental setup. The results show that PPO performs better than DDPG and DQN in terms of stability, while DDPG exhibits the fastest convergence speed among the three. DQN performs poorly in continuous control due to its dependence on Q-maximization and discrete action enumeration, causing the large fluctuations during the convergence process. This work emphasizes the significance of environment-algorithm compatibility and offers experimental support for algorithm selection in continuous control applications

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper