What question did this study set out to answer?

The aim is to enhance feature representation and contextual modeling for small object detection in UAV imagery.

May 24, 2026Open Access

MDCL-DETR: Multi-Domain Enhancement and Cross-Layer Feature Fusion for Small Object Detection

Key Points

The aim is to enhance feature representation and contextual modeling for small object detection in UAV imagery.
Developed a multi-domain enhancement module for better feature distinction from background noise.
Utilized cross-layer feature extraction to maintain spatial details and integrate multi-scale features.
Implemented a gated Mamba fusion module for dynamic weighting of local details and global context.
Achieved mAP50 scores of 54.1% on VisDrone2019 dataset.
Achieved mAP50 scores of 56.2% on AI-TOD dataset.

Abstract

Small object detection in uncrewed aerial vehicle (UAV) imagery is hindered by limited pixels, insufficient detailed information, and strong background interference, leading to weak feature representation and poor contextual modeling. To address these issues, we propose a multi-domain enhancement and cross-layer feature fusion detection Transformer (MDCL-DETR) with progressive feature processing. First, a multi-domain enhancement module (MDEM) based on CSP (cross stage partial) structure is proposed, which fuses spatial and frequency-domain features in a lightweight manner to enhance object detail and global structures while effectively distinguishing object features from background interference. Second, a cross-layer feature extraction module (CLEM) is introduced to aggregate multi-scale features across layers, alleviate information loss caused by downsampling, and preserve spatial details of small objects while integrating high-level contextual semantics. Meanwhile, a gated Mamba fusion module (GMFM) is proposed, which adopts the Mamba architecture for long-range dependency modeling of multi-scale features and integrates a gating mechanism to realize the dynamic weighted fusion of local details and global context, further improving feature discriminability and global modeling capability. Finally, a fine-grained enhancement module (FGEM) is designed, which leverages feature reorganization and adaptive feature extraction to reinforce and compensate fine-grained features. Extensive experimental results validate the effectiveness and generalization of the proposed method, achieving mAP50 scores of 54.1% and 56.2% on the VisDrone2019 and AI-TOD datasets.

MDCL-DETR: Multi-Domain Enhancement and Cross-Layer Feature Fusion for Small Object Detection

Key Points

Abstract

Cite This Study

Also Consider

Also Consider