What question did this study set out to answer?

The aim is to improve path planning in sparse reward environments through the integration of adaptive reward shaping and curriculum learning.

January 23, 2026Open Access

Path Planning in Sparse Reward Environments: A DQN Approach with Adaptive Reward Shaping and Curriculum Learning

Key Points

The aim is to improve path planning in sparse reward environments through the integration of adaptive reward shaping and curriculum learning.
Developed CLARS-DQN algorithm that combines Adaptive Reward Shaping and Curriculum Learning.
Implemented ARS-DQN with a learnable intrinsic reward to tackle reward sparsity.
Used Prioritized Experience Replay for better training efficiency and stability.
Structured training progresses from simple to complex tasks to enhance algorithm generalization.
CLARS-DQN outperforms baseline methods in task success rate and path quality.
Achieved a 12% improvement in task success rate in unseen environments.
Average path length improved by 26% compared to traditional methods.
Demonstrated strong generalization capabilities and improved training efficiency.

Abstract

Deep reinforcement learning (DRL) has shown great potential in path planning tasks. However, in sparse reward environments, DRL still faces significant challenges such as low training efficiency and a tendency to converge to suboptimal policies. Traditional reward shaping methods can partially alleviate these issues, but they typically rely on hand-crafted designs, which often introduce complex reward coupling, make hyperparameter tuning difficult, and limit generalization capability. To address these challenges, this paper proposes Curriculum-guided Learning with Adaptive Reward Shaping for Deep Q-Network (CLARS-DQN), a path planning algorithm that integrates Adaptive Reward Shaping (ARS) and Curriculum Learning (CL). The algorithm consists of two key components: (1) ARS-DQN, which augments the DQN framework with a learnable intrinsic reward function to reduce reward sparsity and dependence on expert knowledge; and (2) a curriculum strategy that guides policy optimization through a staged training process, progressing from simple to complex tasks to enhance generalization. Training also incorporates Prioritized Experience Replay (PER) to improve sample efficiency and training stability. CLARS-DQN outperforms baseline methods in task success rate, path quality, training efficiency, and hyperparameter robustness. In unseen environments, the method improves task success rate and average path length by 12% and 26%, respectively, demonstrating strong generalization. Ablation studies confirm the critical contribution of each module.

Path Planning in Sparse Reward Environments: A DQN Approach with Adaptive Reward Shaping and Curriculum Learning

Key Points

Abstract

Cite This Study