What question did this study set out to answer?

The aim is to enhance UAV object detection by improving multimodal feature integration for better accuracy.

February 14, 2026

Adaptive Fine-Grained Fusion Network for Multimodal UAV Object Detection

Key Points

The aim is to enhance UAV object detection by improving multimodal feature integration for better accuracy.
Developed a local feature consistency-based modality fusion module for weighted local feature aggregation.
Introduced a mutual information-guided feature contrastive loss during initial training.
Focused on optimizing detection performance despite occlusions and illumination variations.
Achieved state-of-the-art performance on multimodal UAV object detection benchmarks.
Significantly improved detection accuracy in scenarios with object occlusion.
Demonstrated effective fusion of RGB and infrared modalities under varying conditions.

Abstract

Multimodal perception and fusion play a vital role in unmanned aerial vehicle (UAV) object detection. Existing methods typically adopt global fusion strategies across modalities. However, due to illumination variation, the effectiveness of RGB and infrared modalities may differ across local regions within the same image, particularly in UAV perspectives where occlusions and dense small objects are prevalent, leading to suboptimal performance of global fusion methods. To address this issue, we propose an adaptive fine-grained fusion network for multimodal UAV object detection. First, we design a local feature consistency-based modality fusion module, which adaptively assigns local fusion weights according to the structural consistency of high-response regions across modalities, thereby enabling more effective aggregation of object-relevant features. Second, we introduce a mutual information-guided feature contrastive loss to encourage the preservation of modality-specific information during the early training phase. Experimental results demonstrate that the proposed method effectively addresses the issue of object occlusion in UAV perspectives, achieving state-of-the-art performance on multimodal UAV object detection benchmarks. Code will be available at https://github.com/lingf5877/AFFNet.

Bookmark

Adaptive Fine-Grained Fusion Network for Multimodal UAV Object Detection

Key Points

Abstract

Cite This Study