ABSTRACT This work develops a novel off‐policy safe reinforcement learning (RL) approach for optimal tracking of continuous‐time nonlinear systems, affine in control input. The main contribution consists of the synthesis of an optimal tracker under safety guarantees. A novel formulation is developed enabling optimal tracking of references while satisfying state‐based safety constraints. The tracking error and the state dynamics are considered to form an augmented system, facilitating this dual objective with the primary goal being to guarantee the safety without compromising the system performance. To this end, the safety is achieved during the exploration phase, by dynamically adjusting control inputs that are solutions of quadratic programming (QP) problem that incorporates zeroing control barrier function (ZCBF) conditions. Additionally, the safety during exploitation (operational phase) of the learned policy is strengthened by integrating a reciprocal control barrier function (RCBF) into the cost function, leading to an effective trade‐off between safety and system performance. Neural networks are employed to approximate the optimal control law, and novel mathematically rigorous proofs are developed to guarantee the safety, the stability, and the convergence towards optimality. Finally, the effectiveness of the approach is assessed using a simulation example.
Building similarity graph...
Analyzing shared references across papers
Loading...
Soha Kanso
Centre National de la Recherche Scientifique
Mayank Shekhar Jha
Centre National de la Recherche Scientifique
Didier Theilliol
International Journal of Robust and Nonlinear Control
Centre National de la Recherche Scientifique
Université de Lorraine
Building similarity graph...
Analyzing shared references across papers
Loading...
Kanso et al. (Wed,) studied this question.
synapsesocial.com/papers/69be36416e48c4981c675146 — DOI: https://doi.org/10.1002/rnc.70518