What question did this study set out to answer?

This research aims to improve the robustness of path planning in autonomous mobile robots using deep reinforcement learning.

April 26, 2026Open Access

Robust Path Planning via Deep Reinforcement Learning

Key Points

This research aims to improve the robustness of path planning in autonomous mobile robots using deep reinforcement learning.
Proposed a hybrid model combining deep reinforcement learning with traditional path planning.
Pre-processed LiDAR point cloud data to enhance state representation and computational efficiency.
Utilized a TD3 network and employed experience prioritization based on negative rewards.
Achieved significantly higher average reward values compared to existing reinforcement learning networks.
Demonstrated improved stability in the path-planning process with enhanced decision-making.
Results averaged over 10 independent runs to ensure statistical significance.

Abstract

Deep reinforcement learning (DRL) for autonomous mobile robot navigation faces several inherent limitations. The stochastic nature of actions generated by DRL policies can undermine performance consistency, while inefficient exploration frequently delays the learning process or prevents the discovery of optimal solutions. This research aims to enhance the robustness of path planning by addressing these challenges. To achieve this goal, we propose a hybrid approach that integrates the flexible decision-making capabilities of deep reinforcement learning with the stability of traditional path planning. The proposed model adopts the Twin Delayed Deep Deterministic Policy Gradient (TD3) network as its base. Notably, we pre-process LiDAR point cloud data to extract only essential features for the state representation, thereby preventing performance degradation from high-dimensional inputs and improving computational efficiency. Our model optimizes the learning process through two core strategies. First, it prioritizes experience data generated during training based on negative rewards, guiding the model to learn more frequently from critical failures rather than redundant successes. Second, it dynamically compares the action proposed by the TD3 network with a goal-oriented action from a classical path-planning algorithm in real time. By selecting the action with the higher estimated value, the model guides the policy toward a stable and effective trajectory from the earliest stages of training. To validate the efficacy of our approach, we conducted simulation-based experiments comparing the performance of the proposed model with existing reinforcement learning networks. To ensure statistical significance and mitigate the impact of random initialization, all reported results are averaged over 10 independent runs with different random seeds. The results quantitatively demonstrate that our model achieves significantly higher and more stable reward values, confirming a robust improvement in the path-planning process.

Robust Path Planning via Deep Reinforcement Learning

Key Points

Abstract

Cite This Study