What question did this study set out to answer?

This work aims to improve the motion prediction of traffic participants for autonomous vehicles through enhanced interaction modeling.

May 24, 2026

HSIG-Net: Trajectory prediction based on hierarchical spatial perception interaction and intent guidance

Key Points

This work aims to improve the motion prediction of traffic participants for autonomous vehicles through enhanced interaction modeling.
Developed a hybrid feature encoder combining spatio-temporal attention and State Space Models to extract contextual features.
Introduced a hierarchical bidirectional spatial interaction mechanism for multi-round information flow between agents and lane lines.
Implemented a two-stage decoding strategy to generate refined trajectories incorporating intent priors and geometric corrections.
Achieved superior prediction accuracy and inference speed compared to conventional models.
Validated effectiveness of hierarchical interaction with improvements demonstrated on Argoverse 1 and 2 benchmarks.

Abstract

Accurate motion prediction of traffic participants is essential for the safe planning of autonomous vehicles. However, conventional interaction modeling often lacks a hierarchical information transfer mechanism, making it difficult to capture deep dependencies between agents and road topology. Furthermore, traditional direct regression or static anchor-based decoding strategies struggle to adapt to diverse driving intentions and lack mechanisms for geometric refinement based on scene context. To address these issues, this paper proposes a trajectory prediction model based on hierarchical spatial-aware interaction and intent guidance. First, a hybrid feature encoder is constructed by integrating spatio-temporal attention with State Space Models (SSMs) to efficiently extract long- and short-range contextual features from dynamic scenes. Second, a hierarchical bidirectional spatial interaction mechanism is developed. By establishing multi-round information flows between agents and lane lines, the model achieves hierarchical feature fusion ranging from global scene perception to local fine-grained interaction. Finally, an intent-guided two-stage decoding strategy is designed: the first stage generates initial anchors incorporating intent priors via a multimodal network, while the second stage applies geometric bias correction based on scene context to produce refined trajectories that adhere to physical constraints. Experimental results on the Argoverse 1 and 2 benchmarks demonstrate that the proposed model achieves a superior balance between prediction accuracy and inference speed, validating the effectiveness of the hierarchical interaction and two-stage decoding strategies.

Ask AI

Mark Helpful

Bookmark

Relay