What question did this study set out to answer?

The aim is to improve land-cover classification accuracy by effectively integrating hyperspectral and LiDAR data.

April 4, 2026Open Access

Multimodal Remote Sensing Image Classification Based on Dynamic Group Convolution and Bidirectional Guided Cross-Attention Fusion

Puntos clave

The aim is to improve land-cover classification accuracy by effectively integrating hyperspectral and LiDAR data.
Developed a framework called DGC-BCAF focusing on adaptive feature representation.
Implemented a Dynamic Group Convolution (DGConv) module for spatial context extraction.
Introduced a LiDAR texture encoding branch to capture geometric height and surface details.
Employed Bidirectional Guided Cross-Attention Fusion (BCAF) for balanced modality contribution.
Conducted experiments on three benchmark datasets: Houston 2013, Trento, and MUUFL.
DGC-BCAF significantly outperformed state-of-the-art methods in overall accuracy.
Demonstrated improvements in average accuracy and Kappa coefficient.
Effectively distinguished spectrally similar materials and delineated complex urban structures.

Resumen

The synergistic integration of Hyperspectral Imaging (HSI) and Light Detection and Ranging (LiDAR) data has become a pivotal strategy in remote sensing for precise land-cover classification. However, existing multimodal deep learning frameworks frequently suffer from intrinsic limitations, including rigid feature extraction protocols, underutilization of LiDAR-derived textural information, and asymmetric fusion mechanisms that fail to balance the contribution of spectral and elevation features effectively. To address these challenges, this paper proposes a novel framework named DGC-BCAF, which integrates Dynamic Group Convolution and Bidirectional Guided Cross-Attention Fusion to achieve adaptive feature representation and robust cross-modal interaction. First, a Dynamic Group Convolution (DGConv) module embedded within a ResNet18 backbone is designed to function as the central spatial context extractor. Unlike traditional group convolution, this module learns a dynamic relationship matrix to automatically group input channels, thereby facilitating flexible and context-aware feature representation that adapts to complex spatial distributions. Second, to overcome the insufficient exploitation of elevation data, we introduce a dedicated LiDAR texture encoding branch. This branch innovatively fuses Gray-Level Co-occurrence Matrix (GLCM) statistical features with multi-scale convolutional representations, capturing both geometric height information and fine-grained surface textural details that are critical for distinguishing objects with similar elevations. Finally, central to our architecture is the Bidirectional Cross-Attention Fusion (BCAF) module. Unlike standard unidirectional fusion approaches, BCAF employs a LiDAR geometry to guide the selection of salient spectral bands, while simultaneously utilizing spectral signatures to emphasize informative LiDAR channels. This mutual guidance ensures a balanced contribution from both modalities. Extensive experiments conducted on three benchmark datasets—Houston 2013, Trento, and MUUFL—demonstrate that DGC-BCAF consistently outperforms state-of-the-art methods in terms of overall accuracy, average accuracy, and Kappa coefficient. The results confirm that the proposed adaptive grouping and bidirectional guidance strategies significantly improve classification performance, particularly in distinguishing spectrally similar materials and delineating complex urban structures.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Zhang et al. (Thu,) studied this question.

synapsesocial.com/papers/69d0af52659487ece0fa54ad https://doi.org/https://doi.org/10.3390/rs18071066

Me gusta

Guardar

Ver artículo completo