What question did this study set out to answer?

The aim is to improve mobile robot navigation through hybrid reinforcement learning methods by addressing challenges posed by sparse rewards.

March 21, 2026

Hybrid Soft Actor-Critic with Curriculum Learning for Sparse-Reward Mobile Robot Navigation

Key Points

The aim is to improve mobile robot navigation through hybrid reinforcement learning methods by addressing challenges posed by sparse rewards.
Evaluated extended Soft Actor-Critic methods for TurtleBot3 navigation in Gazebo.
Introduced SAC-XH, which integrates auxiliary shaping signals and a curriculum for better exploration.
Conducted experiments in progressively complex Gazebo environments.
Implemented a stage-wise Curriculum Learning protocol with competence-based advancement.
SAC-XH improved training stability and success rate compared to other methods like SAC, TD3, and DDPG.
Achieved success rates between 87-91% under calibrated thresholds with the curriculum protocol.
Demonstrated improved learning efficiency and generalization across stages compared to non-curriculum training.

Abstract

This paper presents a unified empirical study of extended Soft Actor-Critic methods for sparse-reward TurtleBot3 navigation in Gazebo under dense 360 ∘ LiDAR observations. We introduce SAC-XH, a streamlined SAC extension that augments the sparse task reward with auxiliary shaping signals and integrates a stage-wise curriculum to improve exploration and sample efficiency. Across progressively complex Gazebo environments, SAC-XH improves training stability and success rate compared to SAC, TD3, and DDPG, while maintaining full reproducibility through an open-source ROS 2/Gazebo framework. SAC-XH consistently outperforms the baselines in learning efficiency and success rate, with dense LiDAR observations (360 beams). Additionally, we evaluate a stage-wise Curriculum Learning protocol on top of SAC-XH, using competence-based advancement and controlled replay transfer. Under calibrated thresholds, the curriculum yields stable convergence and high success rates (87–91%), improving generalization across stages compared to non-curriculum training. These results demonstrate that SAC-XH improves convergence and generalization across multiple Gazebo-simulated navigation environments under sparse-reward conditions, providing a strong DRL baseline for autonomous navigation and a reproducible benchmark for future research.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Fabio Demo Rosa

Raul Steinmetz

University of Tsukuba

Daniel Fernando Tello Gamarra

Universidade Federal de Santa Maria

Journals

Journal of Intelligent & Fuzzy Systems

Actions

Institutions

University of Tsukuba

Universidade Federal de Santa Maria

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Hybrid Soft Actor-Critic with Curriculum Learning for Sparse-Reward Mobile Robot Navigation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study