What question did this study set out to answer?

This research aims to develop a safe reinforcement learning framework for nonlinear systems that ensures safe initialization, exploration, and learning.

May 10, 2026

Input-to-State Safety for Reinforcement Learning

Key Points

This research aims to develop a safe reinforcement learning framework for nonlinear systems that ensures safe initialization, exploration, and learning.
Novel off-policy safe reinforcement learning framework proposed for nonlinear dynamical systems
Incorporated input-to-state safety constraints into quadratic programming for safe exploration
Utilized neural networks to approximate value functions and control policies.
Achieved strict safety constraints during exploration while accommodating high exploration noise
Demonstrated effective state-space exploration and successful optimal control law learning in simulations
Established rigorous safety, optimality, and stability properties mathematically.

Abstract

In this article, we present a novel off-policy, safe reinforcement learning (RL) approach for nonlinear dynamical systems under input saturation that guarantees safe initialization, safe exploration, as well as safe learning of optimal control laws. First, to encourage preferable exploration near safety boundaries, important for integrating system behavior near the safety limits, we formulate a safe exploration approach as a robust control problem by considering an enlarged safe set based on input-to-state safe control barrier functions (ISSf-CBFs). These constraints are then incorporated into a quadratic programming (QP) optimization. We propose a novel -tuning law that adaptively enforces stricter safety constraints near the boundaries of the safe set and relaxes constraints deeper within the safe set, encouraging safety boundary-proximal exploration while maintaining forward invariance of the safe set. The proposed -tuning law safely accommodates aggressive, high-magnitude exploration noise, enabling efficient state-space exploration without compromising safety. Next, safe learning under saturation limits is guaranteed through a safety-aware cost function. We establish safety, optimality, and stability properties (novel) in a mathematically rigorous manner. Furthermore, the safe RL problem is solved in an off-policy manner, and neural networks are used to approximate the value function and the control policy. To that end, we establish a novel off-policy equation under input saturation. Finally, simulations demonstrate the efficacy of the proposed framework.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Mayank Shekhar Jha

Centre National de la Recherche Scientifique

Satya Marthi

Centre National de la Recherche Scientifique

Kyriakos G. Vamvoudakis

Georgia Institute of Technology

Journals

IEEE Transactions on Neural Networks and Learning Systems

Actions

Institutions

Centre National de la Recherche Scientifique

Georgia Institute of Technology

Université de Lorraine

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Input-to-State Safety for Reinforcement Learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider