Multi-agent trajectory prediction is considered critical for safe and efficient autonomous driving. However, this task remains highly challenging. The difficulties arise from three aspects: the complex dynamic behaviors of heterogeneous agents, the influence of static road semantics, and the intricate dynamic couplings between agents and their environments. To tackle this, we propose the PointNetPlus Transformer (PNPT) framework, which is built upon a Transformer encoder-decoder structure. First, a Multi-scale Residual-enhanced Polyline Encoder (MRPE) is integrated to extract multi-scale local geometric features of scene context and boost semantic scene understanding. Second, unbiased local coordinate encoding and query-guided attention are adopted to improve modeling efficiency and capture local spatial correlations. Third, an interaction-aware intent query module is designed to enhance multi-modal generation and multi-agent interaction modeling. Traditional single-agent trajectory prediction methods have two critical limitations: insufficient interaction modeling among agents and limited multi-modal generation capabilities. These limitations hinder their performance in multi-agent collaborative scenarios. In contrast, through the proposed designs, the accuracy and reliability of multi-agent multi-modal trajectory prediction are significantly improved while efficiency is ensured. On the Waymo Open Motion Dataset (WOMD), our PNPT model achieves state-of-the-art performance with a minADE of 0.5683, a minFDE of 1.1824, a miss rate of 11.43%, and an mAP of 47.21%, outperforming strong baselines. The effectiveness of each module is verified through extensive ablation studies.
Li et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: