March 28, 2020Open Access

Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

We study a security threat to reinforcement learning where an attacker the learning environment to force the agent into executing a target chosen by the attacker. As a victim, we consider RL agents whose is to find a policy that maximizes average reward in undiscounted-horizon problem settings. The attacker can manipulate the rewards or transition dynamics in the learning environment at training-time and is in doing so in a stealthy manner. We propose an optimization for finding an stealthy attack for different measures attack cost. We provide sufficient technical conditions under which the is feasible and provide lower/upper bounds on the attack cost. We our attacks in two settings: (i) an setting where agent is doing planning in the poisoned environment, and (ii) an setting where the agent is learning a policy using a-minimization framework with poisoned feedback. Our results show that the can easily succeed in teaching any target policy to the victim under conditions and highlight a significant security threat to reinforcement agents in practice.

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo