This paper presents STFusioner, a novel traffic flow prediction model that leverages spatiotemporal feature decoupling encoding and a bidirectional cross-attention-based feature mixing mechanism (Bi-STCA). The model introduces an innovative spatiotemporal representation learning framework following a “decoupling-coupling” paradigm, designed to achieve highly accurate traffic flow predictions in large-scale road networks. Specifically, it first disentangles spatial and temporal features through parallel multi-head self-attention modules, followed by dynamic feature fusion via the proposed Bi-STCA module, which effectively captures complex spatiotemporal interactions inherent in traffic flow. This paradigm enables the implicit learning of empirical traffic dynamics, specifically the propagation of shockwaves, thus facilitating accurate prediction of the evolving dynamics of the traffic network. Extensive experiments on real-world datasets demonstrate that STFusioner achieves state-of-the-art performance in most cases, outperforming existing models with average relative improvements of 1.34% (MAE), 0.77% (RMSE), and 1.24% (MAPE). Ablation studies confirm the Bi-STCA module’s pivotal role in feature fusion and performance gains. Due to its generalizability, STFusioner can be easily adapted to practical traffic flow prediction tasks, positioning it as a versatile solution for real-world applications.
Deng et al. (Wed,) studied this question.