What question did this study set out to answer?

The aim is to improve the collaborative control of multiple surface vessels using a refined MAPPO framework.

February 17, 2026Open Access

An Improved MAPPO for Multi-Surface Vessel Collaboration

Key Points

The aim is to improve the collaborative control of multiple surface vessels using a refined MAPPO framework.
Developed an enhanced MAPPO framework incorporating a counterfactual baseline from CMAPG.
Integrated Generalized Advantage Estimation for better learning signal quality.
Implemented Prioritized Experience Replay with importance sampling for improved sample efficiency.
Conducted simulations to evaluate control performance under different scenarios.
The new framework addresses inefficient credit assignment, leading to faster convergence in sparse reward situations.
Simulation results demonstrate enhanced control performance compared to conventional MAPPO.
The adaptation of the PER mechanism yields significant improvements in learning efficiency.

Abstract

Collaborative control of multiple surface vessels remains a significant challenge in autonomous maritime operations, particularly within environments characterized by sparse rewards. Conventional Multi-Agent Proximal Policy Optimization (MAPPO) often suffers from inefficient credit assignment and slow convergence in such scenarios. To address these limitations, this paper proposes an enhanced MAPPO framework that integrates a counterfactual baseline—derived from Counterfactual Multi-Agent Policy Gradients (CMAPG)—into the Generalized Advantage Estimation (GAE) formulation. Furthermore, a Prioritized Experience Replay (PER) mechanism with importance sampling is incorporated to improve sample efficiency. The counterfactual baseline is necessary to provide precise, agent-specific learning signals within the on-policy paradigm, directly tackling the credit assignment problem. The PER mechanism, carefully adapted with importance sampling, is essential to break the sample-inefficiency barrier by strategically reusing valuable past experiences without compromising stability. This synergistic approach refines credit assignment by isolating individual contributions and maximizes the utility of valuable historical experiences. Simulation results and comparisons validate the enhanced control performance of the proposed controller.

An Improved MAPPO for Multi-Surface Vessel Collaboration

Key Points

Abstract

Cite This Study