Advancements in satellite sensor technology have enabled access to diverse remote sensing (RS) data from multiple platforms. Hyperspectral Image (HSI) data offers rich spectral detail for material identification, while LiDAR captures high-resolution 3D structural information, making the two modalities naturally complementary. By fusing HSI and LiDAR, we can mitigate the limitations of each and improve tasks like land cover classification, vegetation analysis, and terrain mapping through more robust spectral–spatial feature representation. However, traditional multi-scale feature fusion models often struggle with aligning features effectively, which can lead to redundant outputs and diminished spatial clarity. To address these issues, we propose the Cross Attention Bridge for HSI and LiDAR (CAB-HL), a novel dual-path framework that employs a multi-stage cross-attention mechanism to guide the interaction between spectral and spatial features. In CAB-HL, features from each modality are refined across three progressive stages using cross-attention modules, which enhance contextual alignment while preserving the distinctive characteristics of each modality. These fused representations are subsequently integrated and passed through a lightweight classification head. Extensive experiments on three benchmark RS datasets demonstrate that CAB-HL consistently outperforms existing state-of-the-art models, confirm that CAB-HL consistently outperforms in learning deep joint representations for multimodal classification tasks.
Hussain et al. (Fri,) studied this question.