Accurate motion prediction of traffic participants is essential for the safe planning of autonomous vehicles. However, conventional interaction modeling often lacks a hierarchical information transfer mechanism, making it difficult to capture deep dependencies between agents and road topology. Furthermore, traditional direct regression or static anchor-based decoding strategies struggle to adapt to diverse driving intentions and lack mechanisms for geometric refinement based on scene context. To address these issues, this paper proposes a trajectory prediction model based on hierarchical spatial-aware interaction and intent guidance. First, a hybrid feature encoder is constructed by integrating spatio-temporal attention with State Space Models (SSMs) to efficiently extract long- and short-range contextual features from dynamic scenes. Second, a hierarchical bidirectional spatial interaction mechanism is developed. By establishing multi-round information flows between agents and lane lines, the model achieves hierarchical feature fusion ranging from global scene perception to local fine-grained interaction. Finally, an intent-guided two-stage decoding strategy is designed: the first stage generates initial anchors incorporating intent priors via a multimodal network, while the second stage applies geometric bias correction based on scene context to produce refined trajectories that adhere to physical constraints. Experimental results on the Argoverse 1 and 2 benchmarks demonstrate that the proposed model achieves a superior balance between prediction accuracy and inference speed, validating the effectiveness of the hierarchical interaction and two-stage decoding strategies.
Yi et al. (Thu,) studied this question.