March 3, 2026Open Access

DSAC-ICM: A Distributional Reinforcement Learning Framework for Path Planning in 3D Uneven Terrains

Key Points

Effective navigation capabilities were achieved using the DSAC-ICM framework, indicating improved path quality.
The proposed method outperformed traditional path planning algorithms concerning computational cost and convergence speed.
Integrated models learned the full probability distribution of state-action returns, reducing overestimation issues.
Dense intrinsic rewards from the ICM guided agents toward novel states, improving exploration efficiency.

Abstract

Ground autonomous mobile robots are increasingly critical for reconnaissance, patrol, and resupply tasks in public safety and national defense scenarios, where global path planning in 3D uneven terrains remains a major challenge. Traditional planners struggle with high dimensionality, while Deep Reinforcement Learning (DRL) is hindered by two key issues: (1) systematic overestimation of action values (Q-values) due to function approximation error, which leads to suboptimal policies and training instability; and (2) inefficient exploration under sparse reward signals. To address these limitations, we propose DSAC-ICM: a Distributional Soft Actor-Critic framework integrated with an Intrinsic Curiosity Module (ICM). Our method fundamentally shifts the learning paradigm from estimating scalar Q-values to learning the full probability distribution of state-action returns, which inherently mitigates value overestimation. We further integrate the ICM to generate dense intrinsic rewards, guiding the agent toward novel and unvisited states to tackle the exploration challenge. Comprehensive experiments conducted in a suite of realistic 3D uneven-terrain environments demonstrate that DSAC-ICM successfully enables the agent to learn effective navigation capabilities. Crucially, it achieves a superior trade-off between path quality and computational cost when compared to traditional path planning algorithms. Furthermore, DSAC-ICM significantly outperforms other RL baselines in terms of convergence speed and return.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Zhou et al. (Wed,) studied this question.

synapsesocial.com/papers/69a75be3c6e9836116a24085 https://doi.org/https://doi.org/10.3390/s26030853

Bookmark

View Full Paper