Key points are not available for this paper at this time.
Multiagent reinforcement learning (MARL) algorithms have shown promise in the Internet of Things devices, such as unmanned aerial vehicle (UAV) swarms. However, the dynamic nature of large-scale swarm systems, with constantly changing numbers of agents and observed neighbors, poses challenges for MARL adaptation. Existing approaches struggle to extract meaningful features and require a substantial number of experience samples, resulting in low-sample efficiency and high-risk ratios. Moreover, these methods are effective in task-specific scenarios and fail to perform well in multitask settings. To overcome these challenges, this study proposes a high-sample efficient and scalable MARL approach for UAV swarms. The proposed approach incorporates a hypernetwork-based embedding attention (HEA) mechanism for the state representation of the policy network and a multiencoder gated transformer with a multilayer attention (MEGTrMA) mechanism for the value function. The HEA automatically generates weights for each agent to adapt to dynamic scenarios, enhancing representation ability and adaptability while reducing the cost of trial and error for improved learning efficiency. The MEGTrMA captures the contribution of each agent to the global observation, establishing long-term dependencies among them and facilitating stable policy learning in multitask scenarios. Simulation results demonstrate that the proposed method is scalable, generalizable, and high-sample efficient. Compared to learning from scratch, our method significantly reduces training time to less than one-fifth of the initial time by progressively increasing the number of UAVs and their corresponding neighbors. Additionally, the average number of collisions is reduced by an order of magnitude for large-scale UAV swarms.
Cheng et al. (Tue,) studied this question.