Swarm unmanned surface vehicles (USVs) have become a key technology for future maritime defense due to their significant capability for cooperative and autonomous operations. Based on swarm coordination strategies, encirclement tasks make a significant contribution to target containment, interception, and area protection. This study investigates a multi-agent reinforcement learning (MARL) approach for the swarm USV encirclement task, comparing two algorithms: the basic Multi-Agent Proximal Policy Optimization (MAPPO) and its recurrent extension, MAPPO-LSTM. These algorithms were trained in Unity 3D simulation platform using ML-Agents toolkit. Three defenders cooperatively encircle a target in an adapted real-map water environment. The performance evaluation is conducted using four metrics (cumulative reward, angular coverage, maximum angular gap, and rotation number of encirclement). Experimental results show that MAPPO-LSTM reaches better cumulative rewards and improved temporal stability, while the basic MAPPO model produces broader spatial coverage and tighter angular formation. The use of LSTM improves motion smoothness and coordination through temporal memory, resulting in more consistent encirclement behavior. These findings highlight the trade-off between spatial completeness and temporal coherence in USV swarm encirclement and emphasize the potential of the MARL framework for smart maritime defense applications.
Hamid et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: