What question did this study set out to answer?

To improve multi-robot trajectory planning efficiency in communication-challenged environments, specifically underground scenarios.

May 14, 2026

A spatio-temporal graph reinforcement learning-based multi-robot trajectory planning method for small-scale fading in underground communications

Key Points

To improve multi-robot trajectory planning efficiency in communication-challenged environments, specifically underground scenarios.
Proposed a multi-robot trajectory planning method utilizing spatio-temporal graph reinforcement learning.
Calculated edge weights using a small-scale fading channel model and constructed a trajectory planning model with spatio-temporal graph convolutional networks.
Developed a spatio-temporal graph-based multi-agent twin delayed deep deterministic policy gradient (STG-MATD3) algorithm for distributed training.
Achieved a task completion rate of 98.8%.
Shortened average trajectory distance by 10.78% to 18.59% compared to baseline algorithms.
Significantly outperformed existing reinforcement learning algorithms such as MADDPG and MASAC.

Abstract

In scenarios such as underground exploration and mine surveying, distributed multi-robot cooperation faces communication challenges like wireless signal attenuation and multipath interference, which limit perception capabilities and reduce information sharing, thereby affecting trajectory planning efficiency. This paper proposes a multi-robot trajectory planning method based on spatio-temporal graph reinforcement learning. First, a small-scale fading channel model is used to calculate the edge weights of the communication topology, and a trajectory planning model based on spatio-temporal graph convolutional networks (STGCN) is constructed. Through multi-hop message passing with graph convolution, information transmission across dynamic communication links is achieved, and a gated recurrent unit is introduced to memorize and reconstruct observation information, enhancing the robot's information perception in fading signal environments. Next, a spatio-temporal graph-based multi-agent twin delayed deep deterministic policy gradient (STG-MATD3) algorithm is designed based on spatio-temporal graph neural networks. It uses an Actor-Critic architecture to perform distributed strategy training and optimization, enabling multi-robot trajectory planning. The results show that the STG-MATD3 algorithm achieves a task completion rate of 98.8%, with an average trajectory distance shortened by 10.78%∼18.59% compared to baseline algorithms, significantly outperforming reinforcement learning algorithms such as MADDPG and MASAC. Finally, ablation experiments and generalization tests validate the effectiveness and adaptability of the STGCN model in multi-robot trajectory planning, enhancing the application potential of multi-robot underground exploration.

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper