What type of study is this?

This is a Quantitative Study study.

September 28, 2025Open Access

Reinforcement Learning Stabilization for Quadrotor UAVs via Lipschitz-Constrained Policy Regularization

Key Points

The proposed method reduces policy variance by 45%, leading to smoother control responses.
Simulation results indicate that the dynamic clipping parameter adjustment accelerates convergence in quadrotor control.
Adapting the clipping threshold in real time provides deterministic guarantees on oscillation magnitude.
Lipschitz continuity is utilized to interpret the clipping mechanism, enhancing the stability of policy updates.

Abstract

Reinforcement learning (RL), and in particular Proximal Policy Optimization (PPO), has shown promise in high-precision quadrotor unmanned aerial vehicle (QUAV) control. However, the performance of PPO is highly sensitive to the choice of the clipping parameter, and inappropriate settings can lead to unstable training dynamics and excessive policy oscillations, which limit deployment in safety-critical aerial applications. To address this issue, we propose a stability-aware dynamic clipping parameter adjustment strategy, which adapts the clipping threshold ϵt in real time based on a stability variance metric St. This adaptive mechanism balances exploration and stability throughout the training process. Furthermore, we provide a Lipschitz continuity interpretation of the clipping mechanism, showing that its adaptation implicitly adjusts a bound on the policy update step, thereby offering a deterministic guarantee on the oscillation magnitude. Extensive simulation results demonstrate that the proposed method reduces policy variance by 45% and accelerates convergence compared to baseline PPO, resulting in smoother control responses and improved robustness under dynamic operating conditions. While developed within the PPO framework, the proposed approach is readily applicable to other on policy policy gradient methods.

Reinforcement Learning Stabilization for Quadrotor UAVs via Lipschitz-Constrained Policy Regularization

Key Points

Abstract

Cite This Study