What question did this study set out to answer?

The study aims to develop a safe reinforcement learning method for optimal tracking of continuous-time nonlinear systems while ensuring safety.

March 21, 2026

Safe Reinforcement Learning for Optimal Tracking of Continuous‐Time Nonlinear Systems

Key Points

The study aims to develop a safe reinforcement learning method for optimal tracking of continuous-time nonlinear systems while ensuring safety.
Synthesize an optimal tracker under safety constraints using an off-policy approach.
Formulate an augmented system comprising tracking error and state dynamics.
Apply quadratic programming and control barrier functions during exploration and exploitation phases.
Utilize neural networks to approximate the optimal control law.
Demonstrated the ability to maintain safety while achieving optimal tracking performance.
Innovative mathematical proofs ensured safety, stability, and convergence to optimal solutions.
Validation through simulation showed effectiveness of the proposed approach.

Abstract

ABSTRACT This work develops a novel off‐policy safe reinforcement learning (RL) approach for optimal tracking of continuous‐time nonlinear systems, affine in control input. The main contribution consists of the synthesis of an optimal tracker under safety guarantees. A novel formulation is developed enabling optimal tracking of references while satisfying state‐based safety constraints. The tracking error and the state dynamics are considered to form an augmented system, facilitating this dual objective with the primary goal being to guarantee the safety without compromising the system performance. To this end, the safety is achieved during the exploration phase, by dynamically adjusting control inputs that are solutions of quadratic programming (QP) problem that incorporates zeroing control barrier function (ZCBF) conditions. Additionally, the safety during exploitation (operational phase) of the learned policy is strengthened by integrating a reciprocal control barrier function (RCBF) into the cost function, leading to an effective trade‐off between safety and system performance. Neural networks are employed to approximate the optimal control law, and novel mathematically rigorous proofs are developed to guarantee the safety, the stability, and the convergence towards optimality. Finally, the effectiveness of the approach is assessed using a simulation example.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Soha Kanso

Centre National de la Recherche Scientifique

Mayank Shekhar Jha

Centre National de la Recherche Scientifique

Didier Theilliol

Journals

International Journal of Robust and Nonlinear Control

Actions

Institutions

Centre National de la Recherche Scientifique

Université de Lorraine

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Safe Reinforcement Learning for Optimal Tracking of Continuous‐Time Nonlinear Systems

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study