Deep Reinforcement Learning (DRL) policies often exhibit fragility in unseen environments, limiting their deployment in safety-critical applications. While Robust Markov Decision Processes (R-MDPs) enhance control performance by optimizing against worst-case disturbances, the resulting conservative behaviors are difficult to interpret using standard Explainable RL (XRL) methods, which typically ignore adversarial disturbances. To bridge this gap, this paper proposes RAISE (Robust and Adversarially Informed Safe Explanations), a novel framework designed for the Noisy Action Robust MDP (NR-MDP) setting. We first introduce the Decomposed Reward NR-MDP (DRNR-MDP) and the DRNR-Deep Deterministic Policy Gradient (DRNR-DDPG) algorithm to learn robust policies and a vector-valued value function. RAISE utilizes this vectorized value function to generate contrastive explanations (“Why action a instead of b?”), explicitly highlighting the reward components such as safety or energy efficiency prioritized under worst-case attacks. Experiments on a continuous Cliffworld benchmark and the MuJoCo Hopper task demonstrate that the proposed method preserves robust performance under dynamics variations and produces meaningful, component-level explanations that align with intuitive safety and performance trade-offs. Ablation results further show that ignoring worst-case disturbances can substantially alter or invalidate explanations, underscoring the importance of adversarial awareness for reliable interpretability in robust RL.
Kim et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: