Los puntos clave no están disponibles para este artículo en este momento.
Centralized training and decentralized execution (CTDE) paradigm is widely employed to address the nonstationary and partial observability in multiagent reinforcement learning (MARL). One of the main challenges that restricts the performance of the CTDE paradigm is credit assignment. Existing methods cannot sufficiently energize each agent for exploring a broader solution space without compromising performance or factorization complexity. In this article, we propose a self-incentive credit assignment scheme to prioritize individual agent actions based on a novel factorization method called multihead residual value factorization (MRVF) rather than being constrained by the quantity of collective policies. It learns an extra representation of value gradients from the cooperative behaviors and factorizes the residual global joint action value as a monotonic function, which can effectively improve the representability of the value function. Theoretical analysis indicates that our method has stronger representational ability and satisfies the individual-global-max (IGM) condition. Extensive experiments validate that our method achieves significant performance improvement in terms of both the learning speed and stability; particularly, it gains the best performance on two super hard maps of the widely used benchmark StarCraft multiagent challenge (SMAC) while the performances on other scenarios of SMAC are better or as well as the state-of-the-art baseline.
Tang et al. (Wed,) studied this question.
Synapse has enriched 2 closely related papers on similar clinical questions. Consider them for comparative context: