The effective fusion of multi-modal remote sensing images, particularly hyperspectral imagery (HSI) and light detection and ranging (LiDAR) data, is pivotal for accurate land use and land cover (LULC) classification. However, this process is hindered by two inherent challenges: pervasive data redundancy and the underutilization of cross-modal complementarity, largely due to the lack of a unifying theoretical framework. To address these limitations, we propose the multi-modal complementary information bottleneck (MCIB) framework, which extends the IB principle to learn compact, sufficient, and complementary representations for multi-modal scenes. From a theoretical perspective, we formalize the MCIB objective and introduce structured priors to derive tractable information-theoretic bounds, providing a principled and computationally feasible approach to reduce redundancy and enhance complementarity simultaneously. Building on the obtained theoretical insights, we design an end-to-end variational optimization strategy with a novel supervised conditional InfoNCE (SCInfoNCE). Efficiently reusing existing model components, this new supervised contrastive method optimizes the conditional mutual information terms crucial for synergy. Extensive experiments on benchmark HSI-LiDAR datasets demonstrate superior classification performance of MCIB. This work not only fills a theoretical gap in multi-modal representation learning, but offers a robust and principled solution for LULC classification using complex heterogeneous remote sensing images.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xiao Pan
Hao Zhu
Bo Yang
IEEE Transactions on Image Processing
Xidian University
Chang'an University
Building similarity graph...
Analyzing shared references across papers
Loading...
Pan et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69bf86ecf665edcd009e9060 — DOI: https://doi.org/10.1109/tip.2026.3673954