Encoding historical traffic scenarios is critical for autonomous driving trajectory prediction. However, existing methods face two fundamental challenges. First, neglecting future interactions leads to myopic predictions. Second, although query-centric paradigms capture multimodal interactions, their typically static query design may limit the adaptability of cross-temporal context fusion. To address these limitations, we propose Miformer, a trajectory prediction model that enhances historical-future spatial-temporal context interactions. First, we introduce a time query mechanism that transforms static agent interactions into dynamic temporal modeling by enabling cross-timestep information exchange between different agents and road geometry. Building upon this temporal foundation, we design a historical-future spatial-temporal interaction module (HF-STIM) that leverages iTransformer to model bidirectional dependencies between past and future contexts. To address the redundancy inherent in inverted embeddings, we further propose the minus-inverted Transformer (Mi-Transformer), which incorporates a dynamic residual learning mechanism to eliminate irrelevant dependencies while preserving critical bidirectional information flow. Finally, we employ a DETR-like decoder to generate diverse multimodal trajectory predictions. Experimental results demonstrate that Miformer achieves state-of-the-art performance on the INTERACTION dataset and exhibits competitive performance on the Argoverse dataset. These results highlight its effectiveness in real-world motion prediction. Our code is available at https://github.com/Morphlingxxx/Miformer-Trajectory-Prediction.
Wan et al. (Wed,) studied this question.