ABSTRACT Policy heterogeneity is crucial for achieving sophisticated coordination in complex collaborative tasks, which has emerged as one of the key challenges in multi‐agent reinforcement learning (MARL) in recent years. Notably, the grouping paradigm has made remarkable progress in addressing policy heterogeneity. However, most existing grouping methods require predefining the number of groups or the composition and quantity of members within each group, which need to be individually configured for each scenario and are difficult to set without sufficient expert knowledge. By contrast, we propose a novel MARL grouping algorithm named Credit‐driven adaptive Grouping (CreateG) which divides the entire training process into multiple phases and reallocates poorly adapted (low‐credit) individuals at each training stage. With the help of the mechanism we designed, an environment‐adaptive grouping is ultimately formed. Furthermore, we design a hierarchical hypernetwork architecture to accommodate this adaptive grouping mechanism. Experiments conducted on StarCraft II micromanagement hard and superhard tasks, Google Research Football and TAG scenarios show CreateG achives state‐of‐the‐art MARL performance. Moreover, extensive ablation studies elucidate the operational mechanism of the grouping strategy and other components demonstrate how they enhance overall performance.
Liu et al. (Thu,) studied this question.