This article proposes an optimization method based on improved PPO (Proximal Policy Optimization) to address the issues of poor adaptability and uneven motion in autonomous robot path planning in dynamic unknown environments. This method first designs a hybrid reward function that combines sparse event rewards and dense form rewards to simultaneously optimize navigation efficiency, safety, and smoothness; Secondly, a multi-head self-attention mechanism was introduced into the Actor Critic network of PPO to enhance the model’s perception ability of key obstacles in LiDAR data. Experiments were conducted in six simulation scenarios including static and dynamic obstacles, and the results showed that the algorithm proposed in this paper achieved the highest navigation success rate of 98.0%. The average path length and average travel time were reduced to 19.5 meters and 36.2 seconds, respectively, and the trajectory smoothness index was the best, at 10.3 radians, which was significantly better than baseline algorithms such as DWA (Dynamic Window Approach), DDPG (Deep Deterministic Policy Gradient), and standard PPO. The ablation experiment further confirmed the effectiveness and complementarity of the mixed reward and attention mechanisms. This study provides a high-performance solution for robust navigation of robots in complex environments.
Yuxuan Li (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: