What question did this study set out to answer?

The aim is to improve multimodal land cover classification using an efficient fusion framework that overcomes computational barriers.

June 17, 2026Open Access

LMFusion: Breaking the Computational Barrier for Multimodal Classification in Remote Sensing

Key Points

The aim is to improve multimodal land cover classification using an efficient fusion framework that overcomes computational barriers.
Proposed LMFusion framework integrating hyperspectral and LiDAR data.
Employs a linear-complexity cross-attention mechanism for feature interaction.
Utilizes a selective quantization-aware optimization strategy for enhanced model efficiency.
Achieves overall accuracies of 95.84%, 94.95%, and 99.05% on Houston2013, MUUFL, and Augsburg datasets, respectively.
Outperforms existing multimodal methods with significant improvements in classification performance.
Demonstrates potential for efficient multimodal remote sensing classification in resource-constrained environments.

Abstract

Multi-modal land cover classification plays an important role in remote sensing applications such as urban monitoring and environmental analysis. By integrating complementary information from hyperspectral imagery (HSI) and LiDAR data, multimodal learning can significantly improve classification performance. However, existing Transformer-based fusion methods often suffer from high computational complexity and inefficient cross-modal interaction modeling, which limits their applicability in resource-constrained scenarios. To address these challenges, we propose LMFusion, an efficient framework for multimodal feature learning. Specifically, LMFusion enables efficient bidirectional feature interaction through a linear-complexity cross-attention mechanism and enhances long-range spatial-spectral representation learning with Mamba-based state space modeling, thereby achieving effective multimodal dependency modeling with linear computational complexity. In addition, a selective quantization-aware optimization strategy is introduced to support multiple bit-width settings (down to 1-bit), yielding a more compact and efficient model while improving representation robustness under low-bit constraints. Extensive experiments on the Houston2013, MUUFL, and Augsburg datasets demonstrate the effectiveness of LMFusion. It achieves overall accuracies of 95.84%, 94.95%, and 99.05%, respectively, consistently outperforming representative multimodal classification methods and showing strong potential for accurate and efficient multimodal remote sensing classification.

Ask AI

Helpful

Bookmark

View Full Paper