This paper presents a unified empirical study of extended Soft Actor-Critic methods for sparse-reward TurtleBot3 navigation in Gazebo under dense 360 ∘ LiDAR observations. We introduce SAC-XH, a streamlined SAC extension that augments the sparse task reward with auxiliary shaping signals and integrates a stage-wise curriculum to improve exploration and sample efficiency. Across progressively complex Gazebo environments, SAC-XH improves training stability and success rate compared to SAC, TD3, and DDPG, while maintaining full reproducibility through an open-source ROS 2/Gazebo framework. SAC-XH consistently outperforms the baselines in learning efficiency and success rate, with dense LiDAR observations (360 beams). Additionally, we evaluate a stage-wise Curriculum Learning protocol on top of SAC-XH, using competence-based advancement and controlled replay transfer. Under calibrated thresholds, the curriculum yields stable convergence and high success rates (87–91%), improving generalization across stages compared to non-curriculum training. These results demonstrate that SAC-XH improves convergence and generalization across multiple Gazebo-simulated navigation environments under sparse-reward conditions, providing a strong DRL baseline for autonomous navigation and a reproducible benchmark for future research.
Building similarity graph...
Analyzing shared references across papers
Loading...
Fabio Demo Rosa
Raul Steinmetz
University of Tsukuba
Daniel Fernando Tello Gamarra
Universidade Federal de Santa Maria
Journal of Intelligent & Fuzzy Systems
University of Tsukuba
Universidade Federal de Santa Maria
Building similarity graph...
Analyzing shared references across papers
Loading...
Rosa et al. (Fri,) studied this question.
synapsesocial.com/papers/69be35946e48c4981c673eed — DOI: https://doi.org/10.1177/18758967261431354