June 24, 2024Open Access

Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract This paper proposes a gradient-based multi-agent actor-critic algorithm for off-policy reinforcement learning using importance sampling. Our algorithm is incremental with full gradients, and its complexity per iteration scales linearly with the size of approximation features. Previous multi-agent actor-critic algorithms are limited to the on-policy setting or off-policy emphatic temporal difference (TD) learning and they do not take advantage of the advances in off-policy gradient temporal difference learning (GTD). As a theoretical contribution, we establish that the critic step of the proposed algorithm converges to the TD solution of the projected Bellman equation and the actor step converges to the set of asymptotically stable fixed points. Numerical experiments on the multi-agent generalization of the Boyan’s chain problem show that the proposed approach provides improved performances in terms of stability and convergence rate as compared with the state-of-the-art baseline algorithm.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper

Cite This Study

Jineng Ren (Mon,) studied this question.

synapsesocial.com/papers/68e6380ab6db6435875ca2f8 https://doi.org/https://doi.org/10.1007/s44196-024-00560-2