Mobile robot path planning in dynamic environments is challenging because existing deep reinforcement learning methods lack temporal memory, suffer from inefficient sample utilization under uniform replay, and face credit assignment difficulties with sparse rewards. This paper proposes the Self-Attention LSTM TD3 (SAL-TD3) algorithm, which integrates LSTM networks and multi-head self-attention into the TD3 framework to capture temporal dependencies for proactive obstacle avoidance. A rank-based prioritized experience replay with n-step returns improves sample efficiency, and a composite reward function provides dense feedback for efficient policy learning. Experiments show that SAL-TD3 achieves a 91% success rate (vs. 77% for TD3), reduces path length by 16.6%, and lowers collision rate from 23% to 9%. Generalization tests and real-world robot deployment confirm robust sim-to-real transfer performance.
Chen et al. (Fri,) studied this question.