In scenarios such as underground exploration and mine surveying, distributed multi-robot cooperation faces communication challenges like wireless signal attenuation and multipath interference, which limit perception capabilities and reduce information sharing, thereby affecting trajectory planning efficiency. This paper proposes a multi-robot trajectory planning method based on spatio-temporal graph reinforcement learning. First, a small-scale fading channel model is used to calculate the edge weights of the communication topology, and a trajectory planning model based on spatio-temporal graph convolutional networks (STGCN) is constructed. Through multi-hop message passing with graph convolution, information transmission across dynamic communication links is achieved, and a gated recurrent unit is introduced to memorize and reconstruct observation information, enhancing the robot's information perception in fading signal environments. Next, a spatio-temporal graph-based multi-agent twin delayed deep deterministic policy gradient (STG-MATD3) algorithm is designed based on spatio-temporal graph neural networks. It uses an Actor-Critic architecture to perform distributed strategy training and optimization, enabling multi-robot trajectory planning. The results show that the STG-MATD3 algorithm achieves a task completion rate of 98.8%, with an average trajectory distance shortened by 10.78%∼18.59% compared to baseline algorithms, significantly outperforming reinforcement learning algorithms such as MADDPG and MASAC. Finally, ablation experiments and generalization tests validate the effectiveness and adaptability of the STGCN model in multi-robot trajectory planning, enhancing the application potential of multi-robot underground exploration.
Chen et al. (Fri,) studied this question.