What question did this study set out to answer?

This study aims to explore the effectiveness of multi-agent reinforcement learning in enabling swarm unmanned surface vehicles to effectively encircle a target.

June 7, 2026Open Access

Swarm unmanned surface vehicle encirclement task with multi-agent reinforcement learning

Key Points

This study aims to explore the effectiveness of multi-agent reinforcement learning in enabling swarm unmanned surface vehicles to effectively encircle a target.
Two algorithms compared: Multi-Agent Proximal Policy Optimization (MAPPO) and MAPPO-LSTM.
Formation and performance evaluated using Unity 3D simulation platform and ML-Agents toolkit.
Three defenders collaboratively encircle a target in an adapted water environment.
MAPPO-LSTM achieved higher cumulative rewards and improved temporal stability.
Basic MAPPO model demonstrated broader spatial coverage and tighter angular formation.
LSTM extension enhanced motion smoothness and coordination in encirclement behavior.

Abstract

Swarm unmanned surface vehicles (USVs) have become a key technology for future maritime defense due to their significant capability for cooperative and autonomous operations. Based on swarm coordination strategies, encirclement tasks make a significant contribution to target containment, interception, and area protection. This study investigates a multi-agent reinforcement learning (MARL) approach for the swarm USV encirclement task, comparing two algorithms: the basic Multi-Agent Proximal Policy Optimization (MAPPO) and its recurrent extension, MAPPO-LSTM. These algorithms were trained in Unity 3D simulation platform using ML-Agents toolkit. Three defenders cooperatively encircle a target in an adapted real-map water environment. The performance evaluation is conducted using four metrics (cumulative reward, angular coverage, maximum angular gap, and rotation number of encirclement). Experimental results show that MAPPO-LSTM reaches better cumulative rewards and improved temporal stability, while the basic MAPPO model produces broader spatial coverage and tighter angular formation. The use of LSTM improves motion smoothness and coordination through temporal memory, resulting in more consistent encirclement behavior. These findings highlight the trade-off between spatial completeness and temporal coherence in USV swarm encirclement and emphasize the potential of the MARL framework for smart maritime defense applications.

Swarm unmanned surface vehicle encirclement task with multi-agent reinforcement learning

Key Points

Abstract

Cite This Study

Also Consider

Also Consider