What question did this study set out to answer?

The study aims to improve object detection in UAV imagery by addressing issues related to background interference and small object recognition.

May 8, 2026Open Access

Region-aware and cross-scale feature mining for UAV object detection

Key Points

The study aims to improve object detection in UAV imagery by addressing issues related to background interference and small object recognition.
Developed Mamba-based Region-aware and Cross-scale latent feature mining Detector (MRCDet) for UAV applications.
Utilized a Patch-aware Feature Extractor (MPAFE) to isolate object characteristics from background noise.
Employed Multi-scale Parallel Dilated Convolutions (MPDConv) for robust semantic extraction of small object details.
Achieved an overall performance boost of 5.4% and 7.4% compared to baseline models.
Improved small-object detection metrics by 3.9% and 7.7%, showcasing greater efficacy in challenging environments.

Abstract

Unmanned Aerial Vehicles (UAVs) have become a prevalent tool for aerial image analysis, thanks to their agility in low-altitude flight and real-time sensing. However, objects in UAV imagery typically occupy minimal pixel areas and lack sufficient visual cues, leaving them highly vulnerable to complex background clutter. Moreover, current state-of-the-art detectors struggle to separate foreground objects from background elements when capturing global contextual dependencies. To overcome these bottlenecks, we introduce a Mamba-based Region-aware and Cross-scale latent feature mining Detector (MRCDet). Specifically, we design a Mamba-based Patch-aware Network (MPANet) as the backbone, incorporating a novel Patch-aware Feature Extractor (MPAFE) to isolate object characteristics from background interference. Within MPAFE, an explicit region classification loss (Formula: see text) is applied to compel the network to highlight object areas and suppress irrelevant noise during regional context aggregation. Additionally, a Cross Mamba-based Potential Small Object Mining Module (CPSOMM) is developed to prevent spatial information degradation. By leveraging Multi-scale Parallel Dilated Convolutions (MPDConv) for scale-robust semantic extraction, alongside a Cross-Mamba structure for inter-spatial connections, CPSOMM successfully revitalizes hidden small-object details in shallow feature maps using high-level semantics. Extensive experiments on the VisDrone and UAVDT benchmarks confirm the superiority of our framework. Compared to baselines, MRCDet boosts the overall Formula: see text by 5.4% and 7.4%, while improving the small-object Formula: see text by 3.9% and 7.7%, proving its exceptional efficacy and stability.

Region-aware and cross-scale feature mining for UAV object detection

Key Points

Abstract

Cite This Study

Also Consider

Also Consider