Ground autonomous mobile robots are increasingly critical for reconnaissance, patrol, and resupply tasks in public safety and national defense scenarios, where global path planning in 3D uneven terrains remains a major challenge. Traditional planners struggle with high dimensionality, while Deep Reinforcement Learning (DRL) is hindered by two key issues: (1) systematic overestimation of action values (Q-values) due to function approximation error, which leads to suboptimal policies and training instability; and (2) inefficient exploration under sparse reward signals. To address these limitations, we propose DSAC-ICM: a Distributional Soft Actor-Critic framework integrated with an Intrinsic Curiosity Module (ICM). Our method fundamentally shifts the learning paradigm from estimating scalar Q-values to learning the full probability distribution of state-action returns, which inherently mitigates value overestimation. We further integrate the ICM to generate dense intrinsic rewards, guiding the agent toward novel and unvisited states to tackle the exploration challenge. Comprehensive experiments conducted in a suite of realistic 3D uneven-terrain environments demonstrate that DSAC-ICM successfully enables the agent to learn effective navigation capabilities. Crucially, it achieves a superior trade-off between path quality and computational cost when compared to traditional path planning algorithms. Furthermore, DSAC-ICM significantly outperforms other RL baselines in terms of convergence speed and return.
Zhou et al. (Wed,) studied this question.