February 16, 2021Open Access

Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

We study black-box reward poisoning attacks against reinforcement learning (RL), in which an adversary aims to manipulate the rewards to mislead a of RL agents with unknown algorithms to learn a nefarious policy in an unknown to the adversary a priori. That is, our attack makes assumptions on the prior knowledge of the adversary: it has no initial of the environment or the learner, and neither does it observe the's internal mechanism except for its performed actions. We design a black-box attack, U2, that can provably achieve a near-matching to the state-of-the-art white-box attack, demonstrating the of reward poisoning even in the most challenging black-box setting.

Me gusta

Guardar

Ver artículo completo

Cite This Study

Rakhsha et al. (Tue,) studied this question.

synapsesocial.com/papers/6a15b8cd814bf8ec9a4ef827 https://doi.org/https://doi.org/10.48550/arxiv.2102.08492

Me gusta

Guardar

Ver artículo completo