Modeling spatio-temporal dependencies in dynamic multi-agent systems poses fundamental challenges in engineering and computational intelligence. Real-world networks such as traffic flows, social diffusion, and collective motion evolve under nonlinear, nonstationary, and partially observable interactions, where traditional sequence models or static graph learning frameworks often fail to capture structural dynamics and long-range dependencies. To address these challenges, we propose the Graph Diffusion Transformer (GDT), a unified framework that integrates multi-hop diffusion encoding with structureaware temporal attention for interpretable and stable multi-agent propagation forecasting. The diffusion encoder explicitly models forward and backward information flow as learnable graph operators, providing spatial inductive priors for temporal attention alignment. A diffusion-consistency regularizer further constrains latent transitions, ensuring temporal smoothness and suppressing instability over long horizons. Theoretically, we establish convergence and Lyapunov-type stability guarantees, while empirically, GDT achieves state-of-the-art performance across heterogeneous benchmarks including PEMS-BAY, Weibo-Propagation, and MultiAgent-CrowdSim. Compared with leading spatio-temporal models such as ST-Transformer and DCRNN, GDT reduces RMSE by up to 18%, maintains stable prediction under topological perturbations, and exhibits interpretable diffusion attention revealing dominant propagation pathways. Overall, GDT bridges the gap between physically grounded diffusion processes and attention-based temporal reasoning, offering a robust, scalable, and transparent solution for multi-agent forecasting under dynamic graph environments.
Liang et al. (Thu,) studied this question.