What question did this study set out to answer?

The aim is to optimize path planning for autonomous robots in dynamic unknown environments using deep reinforcement learning techniques.

June 1, 2026Open Access

Path Planning Optimization of Autonomous Robots Based on Deep Reinforcement Learning

Key Points

The aim is to optimize path planning for autonomous robots in dynamic unknown environments using deep reinforcement learning techniques.
An improved Proximal Policy Optimization (PPO) approach was designed to enhance adaptability in path planning.
A hybrid reward function combining sparse and dense rewards was implemented to optimize navigation efficiency, safety, and smoothness.
A multi-head self-attention mechanism was integrated into the Actor Critic network to improve obstacle perception using LiDAR data.
Achieved a navigation success rate of 98.0%.
Reduced average path length to 19.5 meters and average travel time to 36.2 seconds.
Trajectory smoothness index improved to 10.3 radians, outperforming baseline algorithms such as DWA, DDPG, and standard PPO.

Abstract

This article proposes an optimization method based on improved PPO (Proximal Policy Optimization) to address the issues of poor adaptability and uneven motion in autonomous robot path planning in dynamic unknown environments. This method first designs a hybrid reward function that combines sparse event rewards and dense form rewards to simultaneously optimize navigation efficiency, safety, and smoothness; Secondly, a multi-head self-attention mechanism was introduced into the Actor Critic network of PPO to enhance the model’s perception ability of key obstacles in LiDAR data. Experiments were conducted in six simulation scenarios including static and dynamic obstacles, and the results showed that the algorithm proposed in this paper achieved the highest navigation success rate of 98.0%. The average path length and average travel time were reduced to 19.5 meters and 36.2 seconds, respectively, and the trajectory smoothness index was the best, at 10.3 radians, which was significantly better than baseline algorithms such as DWA (Dynamic Window Approach), DDPG (Deep Deterministic Policy Gradient), and standard PPO. The ablation experiment further confirmed the effectiveness and complementarity of the mixed reward and attention mechanisms. This study provides a high-performance solution for robust navigation of robots in complex environments.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper