In the field of remote sensing applications, multimodal object detection has emerged as an important technique for enhancing perception robustness in UAV-based scenarios. Nevertheless, RGB–IR UAV detection remains difficult: Degraded illumination destabilizes shallow representations and weakens local discriminative cues, while spatial inconsistencies and fluctuating modality reliability further hinder cross-modal interaction. In addition, existing methods, which often depend on global illumination estimation or simplistic fusion schemes, struggle to jointly maintain contextual stability, reliable cross-modal interaction, and compact discriminative representations in complex aerial scenes. To address these issues, this paper proposes LDSDet, an RGB–IR multimodal UAV object detector for challenging illumination conditions. Specifically, LDSDet integrates three complementary modules: a Long-range Aware Residual Convolution (LARC) module that enhances contextual perception and stabilizes shallow features; a Dynamic Attention-based Cross-modal Fusion (DACF) block that performs spatially adaptive RGB–IR interaction; and a lightweight SeqShuffleGate (SSG) module that suppresses redundant fusion responses to yield compact and discriminative multimodal representations. Extensive experiments on DroneVehicle, FLIR-Aligned, and LLVIP demonstrate the effectiveness of LDSDet, which achieves 85.2% mAP50, 45.3% mAP, and 67.1% mAP, respectively, showing strong robustness under day–night alternation, low-light environments, and complex illumination variations.
Sun et al. (Wed,) studied this question.