ABSTRACT High‐definition (HD) map construction is a critical component of autonomous driving, providing precise geometric and semantic priors to support downstream planning modules. Although existing methods have improved accuracy, they still suffer from disordered vectorized sequences and topological distortions caused by noisy sensor inputs and fragmented decoding processes. To overcome these limitations, we propose HDP‐Map, a novel dual‐path framework that incorporates hierarchical semantic‐geometric interaction through two key innovations. First, we introduce the median‐enhanced multi‐scale spatial attention () module, a noise‐robust feature fusion mechanism that combines differentiable median pooling with multi‐scale convolutional operations. This design effectively suppresses environmental noise while enhancing local discriminability for elongated structures such as lane markings. Second, we develop a Hierarchical Interaction Decoder, a transformer‐based architecture that jointly optimizes global topological consistency and local geometric continuity through bidirectional refinement between semantic‐level and topology‐level queries. Extensive experiments validate the effectiveness of HDP‐Map on the nuScenes and Argoverse2 benchmarks. On nuScenes, HDP‐Mapv2 achieves 67.3% mean average precision (mAP), surpassing MapTRv2 by 5.8 points; HDP‐Map improves over MapTR by 5.0 points. On Argoverse2, HDP‐Mapv2 attains 69.4% mAP, outperforming MapTRv2 by 2.0 points. The proposed framework offers a deployment‐friendly solution for real‐time HD map construction, striking a balance between precision and efficiency in complex urban environments.
Tian et al. (Mon,) studied this question.