Los puntos clave no están disponibles para este artículo en este momento.
We study black-box reward poisoning attacks against reinforcement learning (RL), in which an adversary aims to manipulate the rewards to mislead a of RL agents with unknown algorithms to learn a nefarious policy in an unknown to the adversary a priori. That is, our attack makes assumptions on the prior knowledge of the adversary: it has no initial of the environment or the learner, and neither does it observe the's internal mechanism except for its performed actions. We design a black-box attack, U2, that can provably achieve a near-matching to the state-of-the-art white-box attack, demonstrating the of reward poisoning even in the most challenging black-box setting.
Rakhsha et al. (Tue,) studied this question.