Key points are not available for this paper at this time.
The fusion of hyperspectral images (HSIs) and LiDAR data can improve land cover classification performance. However, existing fusion methods do not well consider the large discrepancies between two modalities (e.g., brightness, structure, and the possible misalignment), leading to limited improvement. In this paper, we handle this problem in frequency domain and propose a cross-modal hierarchical frequency fusion network (HFNet) for joint classification of HSI and LiDAR data. First, we extract multi-level convolutional features from both modalities. Then, we explore spatial activation maps to adaptively fuse cross-modal frequency features at each level. Since the amplitude and phase information of two modalities are separately fused, the discrepancy problem is alleviated. Finally, we concatenate the fused features at all levels to build a classification loss and an auxiliary frequency consistency loss (FCL). FCL enables the concatenated feature to predict the amplitude and phase information of the input data, which acts as a regularization term and improves the model's discrimination ability. Experimental results on two datasets show the superiority of HFNet over the state-of-the-art methods in terms of classification performance.
Zeng et al. (Mon,) studied this question.