What type of study is this?

This is a Experimental Study study.

September 12, 2025Open Access

CSTC: Visual Transformer Network with Multimodal Dual Fusion for Hyperspectral and LiDAR Image Classification

Key Points

The CSTC model achieves a classification accuracy of 92.32% on the MUUFL dataset, indicating superior performance over traditional methods.
Experimentation shows the CSTC model outperforms latest HSI–LiDAR separate classification algorithms by demonstrating robustness and adaptability.
The dual fusion network employs a two-branch architecture to extract contextual features from hyperspectral and LiDAR data.
The model integrates a Transformer and cross-attention module for innovative modal interaction fusion and classification.

Abstract

Convolutional neural networks have made significant progress in multimodal remote sensing image classification, but traditional convolutional neural networks are limited by fixed-size convolutional kernels, which are unable to effectively model and adequately extract contextual information; hyperspectral imagery and LiDAR data have comparatively large information differences, which do not allow for effective information interaction and fusion. Based on this, this paper proposes a multimodal dual fusion network (CSTC) based on the Vision Transformer for the collaborative classification of HSI and LiDAR data. The model is designed through a two-branch architecture: the HSI branch extracts spectral–spatial features by dimensionality reduction using principal component analysis and inputs them into the cross-connectivity feature fusion module; the LiDAR branch mines spatial elevation features through the stacked MobileNetV2 module. The features of the two branches are encoded by a Transformer, and the modal interaction fusion is realized by the cross-attention module for the first time. Then, the features are spliced and input into the secondary Transformer for deep cross-modal fusion, and finally, the classification is completed by the multilayer perceptron. Experiments show that the CSTC model achieves overall classification accuracies of 92.32%, 99.81%, 97.90%, and 99.37% on the publicly available MUUFL dataset, Trento dataset, Augsburg dataset, and Houston2013 dataset, respectively, which is superior to the latest HSI–LiDAR separate classification algorithms. The ablation experiments and model performance evaluation experiments further show that the proposed CSTC model achieves excellent results in terms of robustness, adaptability, and parameter scale.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper