What question did this study set out to answer?

The aim is to enhance policy robustness in reinforcement learning by leveraging historically optimal policies and regulating performance.

March 16, 2026

Robust Reinforcement Learning via Leveraging Historically Optimal Policy With Regulation of Performance

Key Points

The aim is to enhance policy robustness in reinforcement learning by leveraging historically optimal policies and regulating performance.
Developed the HORP approach for policy optimization.
Constructed a guidance value function considering value gaps and policy distribution divergence.
Implemented an adaptive performance-aware optimization mechanism to ensure timely corrections.
Dynamically modulated perturbation entropy through uncertainty injection.
HORP showed superior performance compared to traditional reinforcement learning methods.
Enhanced robustness against various state attacks was observed.
Achieved improved natural performance across most experiments.

Abstract

Most existing adversarial training methods in reinforcement learning (RL) offer limited robustness and remain vulnerable to novel attacks. To address this limitation, an approach that enhances policy robustness by leveraging the historically optimal policy to guide policy optimization and generating diverse adversarial perturbations, termed robust RL via leveraging historically optimal policy with regulation of performance (HORP), is proposed. Unlike other approaches that rely solely on trial-and-error interactions, HORP constructs a guidance value function by simultaneously considering value gaps and policy distribution divergence, thereby focusing on prioritized learning in promising action spaces. It also incorporates an adaptive performance-aware optimization mechanism to trigger timely corrections, preventing the agent from deviating from optimal performance. Furthermore, HORP dynamically modulates perturbation entropy through controlled uncertainty injection, thereby improving the agent's generalized defensive capabilities. Experiments demonstrate that HORP achieves superior performance in most cases regarding both natural performance and robustness against various state attacks.

Bookmark

Robust Reinforcement Learning via Leveraging Historically Optimal Policy With Regulation of Performance

Key Points

Abstract

Cite This Study