Key points are not available for this paper at this time.
Safe and efficient navigation of a robot in a high-density and dynamic crowd is a challenging task. Most of existing navigation algorithms need to acquire the full dynamics of neighboring humans at all times, making them heavily dependent on a complex upper level information state estimation process. Moreover, the scene perception modules of existing algorithms do not comprehensively model human–robot interaction in the spatiotemporal dimension, leading to frequent freezing and collision problems in reinforcement learning-based navigation algorithms. To address the above problems, we propose a fine-grained spatiotemporal graphical attention navigation algorithm (FST-RL), which enriches the scene perception module of reinforcement learning algorithms by jointly encoding human–robot motion patterns, interhuman relations, and long-term dependent information of human–robot interactions. With the proposed algorithm, socially compatible navigation routes can be generated by the built-in spatiotemporal reasoning module with the premise of obtaining only the position information of agents in the robot's perception domain. The experimental results show a significant improvement of FST-RL in terms of success rate (22.3% improvement), navigation time (13.6% reduction), and average return (20.8% improvement) in a high-density human environment compared with the current optimal navigation algorithm (DS-RNN). Ablation and qualitative experiments show that the scene perception module of FST-RL can effectively reduce the robot's collision and conservative behaviors in challenging scenarios.
Ma et al. (Fri,) studied this question.