Key points are not available for this paper at this time.
The outstanding performance of reinforcement learning (RL) algorithms in sequential decision-making problems has made them favored by researchers in the field of multi-agent collaborative control in recent years. However, the overestimation problem caused by the bootstrapping and maximization processes in the iterative process of the reinforcement learning algorithm restricts the application of reinforcement learning methods in multi-agent systems with continuous action spaces in real space. In order to alleviate the overestimation problem, we propose an advantaged clipped critic networks RL algorithm, which alleviates overestimation by adding target networks and clipping the critic network, and enhances the stability of the algorithm by introducing a dominance function. The designed algorithm was verified in a multi-UAV collaborative environment based on the pybullet physics engine. We designed a total of three multi-UAV collaborative paradigms and compared the performance of the designed algorithm with PPO and DDPG algorithms. The experimental results show that the designed algorithm has improved by 26.67%, 13.35%, and 47.09% respectively, and 23.37%, 7.34%, and 13.05% respectively in the final average cumulative reward among three different cooperative tasks compared to the two comparison algorithms.
Zhang et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: