Unmanned Aerial Vehicles (UAVs) have become a prevalent tool for aerial image analysis, thanks to their agility in low-altitude flight and real-time sensing. However, objects in UAV imagery typically occupy minimal pixel areas and lack sufficient visual cues, leaving them highly vulnerable to complex background clutter. Moreover, current state-of-the-art detectors struggle to separate foreground objects from background elements when capturing global contextual dependencies. To overcome these bottlenecks, we introduce a Mamba-based Region-aware and Cross-scale latent feature mining Detector (MRCDet). Specifically, we design a Mamba-based Patch-aware Network (MPANet) as the backbone, incorporating a novel Patch-aware Feature Extractor (MPAFE) to isolate object characteristics from background interference. Within MPAFE, an explicit region classification loss (Formula: see text) is applied to compel the network to highlight object areas and suppress irrelevant noise during regional context aggregation. Additionally, a Cross Mamba-based Potential Small Object Mining Module (CPSOMM) is developed to prevent spatial information degradation. By leveraging Multi-scale Parallel Dilated Convolutions (MPDConv) for scale-robust semantic extraction, alongside a Cross-Mamba structure for inter-spatial connections, CPSOMM successfully revitalizes hidden small-object details in shallow feature maps using high-level semantics. Extensive experiments on the VisDrone and UAVDT benchmarks confirm the superiority of our framework. Compared to baselines, MRCDet boosts the overall Formula: see text by 5.4% and 7.4%, while improving the small-object Formula: see text by 3.9% and 7.7%, proving its exceptional efficacy and stability.
Zhu et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: