Reinforcement learning is a widely used approach for autonomous navigation, but it often struggles to reach distant, long-horizon goals under safety constraints. The primary reason for this suboptimal performance is that safety requirements significantly degrade exploration capabilities during training, limiting the agent’s ability to discover feasible long-horizon policies. To address this issue, we introduce SG-Safe, a novel learning-based method that decomposes complex navigation tasks into smaller sub-problems using intermediate goals while respecting cumulative safety constraints. Our approach employs two coupled policies: a subgoal policy that generates intermediate subgoals, and a safe policy that leverages these subgoals to guide exploration toward the final objective. This hierarchical structure improves exploration without compromising safety and remains effective even under partial observability. We evaluate our method on autonomous vehicle navigation in simulation and using the Safety Gym benchmark. The experimental results demonstrate a strong safety–performance trade-off under partial observability. On Safety Gym, SG-Safe achieves an average success rate of 0.87 with a collision rate of 0.03, improving over the best end-to-end learning-based safe RL baseline (SR 0.40, CR 0.05). On POLAMP, SG-Safe attains SR 0.90 with CR 0.03, while the best single-policy safe RL baseline reaches at most SR ≤ 0.36 under the same observation settings. Moreover, on POLAMP, SG-Safe also surpasses the planning-based Lyapunov-RRT baseline that has access to full environment information, achieving higher success (SR 0.90 vs. 0.34) and a lower collision rate (CR 0.03 vs. 0.51), while eliminating online planning at deployment (PT = 0 s vs. ≈10–30 s per episode).
Building similarity graph...
Analyzing shared references across papers
Loading...
Gregory Gorbov
Independent University of Moscow
Aleksandr Panov
Cognitive Research (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Gorbov et al. (Sat,) studied this question.
synapsesocial.com/papers/69a52de5f1e85e5c73bf115c — DOI: https://doi.org/10.3390/technologies14030146
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: