ABSTRACT This paper focuses on the multi‐UAV encirclement problem in the presence of obstacles by proposing an improved method that integrates the extended Kalman filter (EKF) into the multi‐agent deep deterministic policy gradient (MADDPG) algorithm. Firstly, the EKF is employed to accurately estimate the target position, providing position information for the subsequent encirclement strategy. Then, based on the estimated target position, the hunting points are calculated and determined. Subsequently, the hunting points are allocated to each UAV in a reasonable manner, ensuring that the UAVs can arrive at the estimated positions efficiently and simultaneously in the shortest time. Moreover, a composite reward function is designed to guide the UAVs to make optimal decisions in the encirclement task, where a segmented reward function is used to train the UAV to perform smooth obstacle avoidance. Through extensive training experiments, the convergence and effectiveness of the proposed improved algorithm are significantly verified, providing strong technical support for the efficient execution of the UAV encirclement task.
Zhang et al. (Sun,) studied this question.