What question did this study set out to answer?

This research aims to improve RGB-IR multimodal object detection in challenging lighting environments for UAV applications.

June 5, 2026Open Access

LDSDet: Long-Range Context and Dynamic Cross-Modal Alignment for Multimodal Object Detection Under Challenging Illumination

Key Points

This research aims to improve RGB-IR multimodal object detection in challenging lighting environments for UAV applications.
Developed LDSDet model with Long-range Aware Residual Convolution (LARC) module, Dynamic Attention-based Cross-modal Fusion (DACF) block, and SeqShuffleGate (SSG) module.
Evaluated performance on DroneVehicle, FLIR-Aligned, and LLVIP datasets.
Measured detection accuracy using mean Average Precision (mAP) metrics.
Achieved 85.2% mAP50, 45.3% mAP, and 67.1% mAP across tested datasets.
Demonstrated high robustness in varying light conditions, including day-night alternation and low-light environments.

Abstract

In the field of remote sensing applications, multimodal object detection has emerged as an important technique for enhancing perception robustness in UAV-based scenarios. Nevertheless, RGB–IR UAV detection remains difficult: Degraded illumination destabilizes shallow representations and weakens local discriminative cues, while spatial inconsistencies and fluctuating modality reliability further hinder cross-modal interaction. In addition, existing methods, which often depend on global illumination estimation or simplistic fusion schemes, struggle to jointly maintain contextual stability, reliable cross-modal interaction, and compact discriminative representations in complex aerial scenes. To address these issues, this paper proposes LDSDet, an RGB–IR multimodal UAV object detector for challenging illumination conditions. Specifically, LDSDet integrates three complementary modules: a Long-range Aware Residual Convolution (LARC) module that enhances contextual perception and stabilizes shallow features; a Dynamic Attention-based Cross-modal Fusion (DACF) block that performs spatially adaptive RGB–IR interaction; and a lightweight SeqShuffleGate (SSG) module that suppresses redundant fusion responses to yield compact and discriminative multimodal representations. Extensive experiments on DroneVehicle, FLIR-Aligned, and LLVIP demonstrate the effectiveness of LDSDet, which achieves 85.2% mAP50, 45.3% mAP, and 67.1% mAP, respectively, showing strong robustness under day–night alternation, low-light environments, and complex illumination variations.

LDSDet: Long-Range Context and Dynamic Cross-Modal Alignment for Multimodal Object Detection Under Challenging Illumination

Key Points

Abstract

Cite This Study