August 8, 2024Open Access

A Single-Loop Finite-Time Convergent Policy Optimization Algorithm for Mean Field Games (and Average-Reward Markov Decision Processes)

Key Points

Key points are not available for this paper at this time.

Abstract

We study the problem of finding an equilibrium of a mean field game (MFG) -- a policy performing optimally in a Markov decision process (MDP) determined by the induced mean field, where the mean field is a distribution over a population of agents and a function of the policy itself. Prior solutions to MFGs are built upon either the contraction assumption on a mean field optimality-consistency operator or strict weak monotonicity. The class of problems satisfying these assumptions represent only a small subset of MFGs, to which any MFG admitting more than one equilibrium does not belong. In this work, we expand the class of solvable MFGs by introducing a "herding condition" and propose a direct gradient-based policy optimization algorithm that provably finds an (not necessarily unique) equilibrium within the class. The algorithm, named Accelerated Single-loop Actor Critic Algorithm for Mean Field Games (ASAC-MFG), is data-driven, single-loop, and single-sample-path. We characterize the finite-time and finite-sample convergence of ASAC-MFG to a mean field equilibrium building on a novel multi-time-scale analysis. We support the theoretical results with illustrative numerical simulations. As an additional contribution, we show how the developed novel analysis can benefit the literature on average-reward MDPs. An MFG reduces to a standard MDP when the transition kernel and reward are independent of the mean field. As a byproduct of our analysis for MFGs, we get an actor-critic algorithm for finding the optimal policy in average-reward MDPs, with a convergence guarantee matching the state-of-the-art. The prior bound is derived under the assumption that the Bellman operator is contractive, which never holds in average-reward MDPs. Our analysis removes this assumption.

A Single-Loop Finite-Time Convergent Policy Optimization Algorithm for Mean Field Games (and Average-Reward Markov Decision Processes)

Key Points

Abstract

Cite This Study