Small object detection in unmanned aerial vehicle (UAV) remote sensing images remains challenging due to large-scale variations, dense object distributions, and complex background interference. Although Transformer-based detectors have improved global context modeling for remote sensing object detection, many existing designs still rely on spatial geometric relationships and conventional cross-level fusion, which may limit the aggregation of spatially scattered small-object features and introduce background interference. To address these issues, this paper proposes a Multi-granularity Prompt and Intensity Guidance Detection Transformer (MPI-DETR), an efficient end-to-end Transformer-based detector for small-object detection in UAV remote sensing images. MPI-DETR consists of three key components: a Dual-Stream Ranked Self-Attention (DRSA) module for intensity-ordered global feature aggregation, a Bilateral Tanh Gating and Cosine Attention Feature Alignment Module (BTC-FAM) for noise-resistant cross-level alignment, and a Prompt-driven Multi-granularity Fusion (PMGF) module for enhancing weak small-object details. Experiments on AI-TOD, DIOR, and NWPU VHR-10 demonstrate that MPI-DETR achieves AP50 scores of 43.8%, 87.5%, and 92.5%, respectively. Compared with the RT-DETR-R18 baseline, MPI-DETR improves AP50 by 3.0, 1.0, and 3.9 percentage points on the three datasets, respectively, and increases APS by 2.9, 2.9, and 11.6 percentage points. It also surpasses the strongest compared models by 1.7, 1.0, and 1.3 percentage points in AP50 on AI-TOD, DIOR, and NWPU VHR-10, respectively. These results indicate that MPI-DETR provides a robust and efficient solution for small-object perception in complex UAV remote sensing applications, especially in scenes with dense targets, background interference, and limited computational resources.
Zhang et al. (Mon,) studied this question.