Abstract This work develops a reinforcement learning (RL)‐based control framework for nonlinear systems that optimizes economic performance while ensuring operational safety. The proposed method combines a real‐time RL feedback policy with an online supervisory mechanism that enforces prescribed safety and closed‐loop stability conditions, using a Lyapunov‐based economic model predictive controller (LEMPC) as a backup controller when needed. To reduce backup reliance, the RL policy is trained in a constraint‐informed environment with penalties on rejected actions. The control framework is applied to a nonlinear chemical process with an economic objective, operational safety considerations, and bounded disturbances. Simulations show recovery of unsafe initial states, improved economic performance relative to LEMPC and imitation learning, and much lower online computational burden than LEMPC.
Cui et al. (Thu,) studied this question.