May 21, 2024

RL-ARNE: A Reinforcement Learning Algorithm for Computing Average Reward Nash Equilibrium of Nonzero-Sum Stochastic Games

Key Points

Key points are not available for this paper at this time.

Abstract

Stochastic games model the strategic interactions between two or more players that occur in a sequence of stages. In this paper we focus on computing the average reward Nash equilibrium (ARNE) of a nonzero-sum stochastic game when the transition probabilities of the game and reward structure of the players are unknown. We note that the current state-of-the-art reinforcement learning (RL) algorithms that compute the ARNE of nonzero-sum stochastic games requires solving a matrix game corresponding to each state of the game at every iteration of the algorithm, which is PPAD Polynomial Parity Arguments on Directed graphs.PPAD-def-complete and incurs a memory complexity that is exponential in the number of players. In this paper, we use temporal difference error minimization and stochastic approximation to develop a scalable RL algorithm to compute an ARNE of nonzero-sum stochastic games. We prove the convergence of our algorithm to an ARNE. We evaluate the performance of our algorithm using an attacker-defender game modeled on a real-world ransomware dataset.

RL-ARNE: A Reinforcement Learning Algorithm for Computing Average Reward Nash Equilibrium of Nonzero-Sum Stochastic Games

Key Points

Abstract

Cite This Study