Proximal Policy Optimization (PPO) is widely adopted for robotic continuous control, yet it can suffer from insufficient exploration and unstable policy updates in high-dimensional action spaces. This paper proposes Adaptive Exploration Proximal Policy Optimization (AE-PPO), an enhanced PPO framework that integrates (i) adaptive clipping, which adjusts the clipping range according to the observed magnitude of policy updates to better balance stability and learning progress, (ii) adaptive entropy regularization, which schedules the entropy weight across training to maintain effective exploration while avoiding excessive randomness. AE-PPO is evaluated on standard MuJoCo continuous control benchmarks (e.g., Walker2d, HalfCheetah, and Humanoid) and compared with PPO and representative baselines such as Trust Region Policy Optimization (TRPO) and Soft Actor Critic (SAC). The results show that AE-PPO achieves faster convergence and an improved final performance with reduced training variance, demonstrating more stable and efficient learning in challenging high-dimensional tasks.
Li et al. (Fri,) studied this question.