Key points are not available for this paper at this time.
This research presents a reinforcement learning framework for stable quadruped locomotion using Proximal Policy Optimization (PPO). We address critical challenges in articulated robot control—including mechanical complexity and trajectory instability by implementing a 12-degree-of-freedom model in PyBullet simulation. Our approach features three key innovations: (1) a hybrid reward function (Rt=0.72 · e−ΔCoGt + 0.25 · vt − 0.11 · τt) explicitly prioritizing center-of-gravity (CoG) stabilization; (2) rigorous benchmarking demonstrating Adam’s superiority over SGD for policy convergence (68% lower reward variance); and (3) a four-metric evaluation protocol quantifying locomotion quality through reward progression, CoG deviation, policy loss, and KL-divergence penalties. Experimental results confirm an 87.5% reduction in vertical CoG oscillation (from 2.0″ to 0.25″) across 1 million training steps. Policy optimization achieved −6.2 × 10−4 loss with KL penalties converging to 0.13, indicating stable gait generation. The framework’s efficacy is further validated by consistent CoG stabilization during deployment, demonstrating potential for real-world applications requiring robust terrain adaptation.
Escudero‐Villa et al. (Wed,) studied this question.