In millimeter wave V2I communication systems, accurate beam prediction is crucial for optimizing network performance and improving signal transmission efficiency. Traditional beam prediction methods mainly rely on single-modal data, which often fails to capture the comprehensive environmental information required for high accuracy prediction. In contrast, multi-modal approaches leverage complementary information from different data sources and offer a more promising solution. However, many existing fusion methods primarily depend on real-time sensory inputs and do not fully exploit stable environmental features in V2I scenarios, limiting the effective use of each modality. To address these limitations, this paper proposes a environment-aware proactive beam prediction method based on a multi-modal prior mask map (MMPMM), which integrates offline mapping with an online beam prediction network. Specifically, the method fuses information from images, point clouds, positions, and the MMPMM to predict the optimal beam index. The MMPMM provides channel-related prior information by extracting static V2I scene features offline without incurring any additional online measurement overhead. Experimental results on real-world datasets demonstrate that the proposed method achieves a Top-3 beam prediction accuracy of up to 71.23% while maintaining stable performance under the evaluated dynamic and degraded conditions, demonstrating its effectiveness in the considered scenarios.
Zhou et al. (Fri,) studied this question.