Multi-Agent Reinforcement Learning (MARL) solves collaboration and competition problems in complex dynamic environments through distributed decision-making mechanisms, and has made significant progress in recent years in areas such as autonomous driving and robot cluster control. In this paper, we systematically sort out the theoretical framework, mainstream methods (e.g., MADDPG, QMIX), commonly used datasets (SMAC, Pommerman), and evaluation criteria (win rate, convergence speed) of MARL, and analyze the core challenges of the existing methods, such as non-smoothness, and credit allocation. Experiments show that the winning rate of the hybrid method on StarCraft II has reached more than 85%, but the communication efficiency and scalability still need to be improved. This paper proposes the improvement direction of combining graph neural networks and meta-learning for subsequent research.
Jinsong Leng (Thu,) studied this question.