August 7, 2024

Priority Over Quantity: A Self-Incentive Credit Assignment Scheme for Cooperative Multiagent Reinforcement Learning

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Centralized training and decentralized execution (CTDE) paradigm is widely employed to address the nonstationary and partial observability in multiagent reinforcement learning (MARL). One of the main challenges that restricts the performance of the CTDE paradigm is credit assignment. Existing methods cannot sufficiently energize each agent for exploring a broader solution space without compromising performance or factorization complexity. In this article, we propose a self-incentive credit assignment scheme to prioritize individual agent actions based on a novel factorization method called multihead residual value factorization (MRVF) rather than being constrained by the quantity of collective policies. It learns an extra representation of value gradients from the cooperative behaviors and factorizes the residual global joint action value as a monotonic function, which can effectively improve the representability of the value function. Theoretical analysis indicates that our method has stronger representational ability and satisfies the individual-global-max (IGM) condition. Extensive experiments validate that our method achieves significant performance improvement in terms of both the learning speed and stability; particularly, it gains the best performance on two super hard maps of the widely used benchmark StarCraft multiagent challenge (SMAC) while the performances on other scenarios of SMAC are better or as well as the state-of-the-art baseline.

Me gusta

Guardar