This work proposes a multiple unmanned aerial vehicles (UAVs) cooperative trajectory planning scheme constructed by multi-agent reinforcement learning with hybrid critics, improving the searching and tracking efficiency and fairness when the dynamic unmanned surface vehicle (USV) swarm exceeds the number of UAVs. A confidence map of targets’ existence probability with spatio-temporal decay is first established through a local information fusion mechanism based on Bayesian update theory. It leads to a reformulation of the problem model into a communication-enhanced partially observable Markov decision process. To suppress policy variance and credibility imbalance of the multi-UAVs, a center-sub-critics deep deterministic policy gradient algorithm is then proposed, combining multiple centralized critics with decentralized critics. Meanwhile, a segmented reward function is designed to incentivize the UAV to revisit detected targets. Finally, the simulation results compared with diverse baseline algorithms demonstrate the efficacy and scalability of the proposed scheme in this paper.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ye Hou
Bo Li
Xueru Miao
Drones
Shanghai Maritime University
Building similarity graph...
Analyzing shared references across papers
Loading...
Hou et al. (Wed,) studied this question.
www.synapsesocial.com/papers/698d6eeb5be6419ac0d54e71 — DOI: https://doi.org/10.3390/drones10020123