Abstract In practical applications, mobile robots are frequently required to operate in challenging environments, including post-disaster scenarios such as earthquakes and floods, as well as complex terrains like polar regions, deserts, and construction sites, where obstacles are often deformable. The combination of uneven terrain and unpredictable obstacles poses significant challenges to the robot’s energy efficiency, safety, and operational effectiveness. This paper addresses these challenges by applying Deep Q-Network (DQN) reinforcement learning theory to the path-planning of mobile robots in unknown environments, proposing the “Deep Q-Network Unknown Complex Environment Path Planning” (DUCP) method. The DUCP method uses 2.5D maps for the model’s state space and incorporates a comprehensive heuristic reward function to guide the learning process, optimizing path length, energy consumption, and safety. Experimental results demonstrate that the proposed method enables robots to identify shorter, more energy-efficient, and safer paths in unknown environments. A comparative analysis reveals that the 2.5D map model outperforms both 2D and 3D map models, achieving a higher task success rate while reducing path length by 20.3%, energy consumption by 23.67%, and increasing safety by 30.13% compared to the 3D map model. Additionally, the study examines the impact of the integrated heuristic reward on model performance, showing that incorporating it improves the mission success rate by 9.2%, reduces path length by 38.57%, decreases energy consumption by 51.68%, and enhances safety by 51.67%.
He et al. (Wed,) studied this question.