Multi-agent perimeter defense plays a critical role in cooperative defense scenarios in unmanned swarms. However, existing deep reinforcement learning approaches struggle to effectively exploit both coordination and temporal information under constrained local communication, and they lack generalization capability under dynamic variations in swarm size. To address these challenges, this paper proposes a multi-agent reinforcement learning strategy that integrates coordination under local communication constraints with spatiotemporal feature modeling. Specifically, a GraphSAGE-based spatial aggregation module is employed to enhance information exchange among defenders, while a GRU-based temporal encoding module processes historical observation sequences to improve coordination and anticipatory capability. Furthermore, to overcome scalability limitations, the inductive node-level aggregation mechanism enables agents to adapt to varying numbers of local neighbors, eliminating dependence on a fixed swarm size. Experimental results demonstrate that the proposed GR-MAPPO consistently improves capture performance under limited communication and exhibits better performance retention under cross-scale transfer across different swarm scales.
Tan et al. (Tue,) studied this question.