Abstract With increasing severity of water pollution, intelligent cleaning technologies based on unmanned systems have attracted widespread attention. To address the dynamic characteristics of floating debris on water surfaces, such as random movement and easy outflow, existing reinforcement learning methods face several challenges in practical scheduling tasks, including the inadequate prediction of future states, the weak prioritization of critical targets, and the insufficient consideration of resource constraints. This paper proposes cross-attention-proximal policy optimization (CA-PPO), a task allocation algorithm for a single unmanned surface vehicle (USV), based on the proximal policy optimization (PPO) algorithm. The proposed algorithm constructed a cleaning task environment that incorporated a flow prediction mechanism for floating debris by considering the battery and load limitations of the USV. A cross-attention module enhanced the policy network’s perception of key debris targets. The experimental results demonstrated that the proposed CA-PPO method outperformed traditional heuristic approaches and other reinforcement learning algorithms in terms of debris collection rate, outflow rate, energy efficiency, and movement distance efficiency.
Li et al. (Thu,) studied this question.