Oriented object detection in remote sensing images faces challenges, such as arbitrary object orientations and uneven distribution of object quantities. Although detection transformer (DETR)-based methods have recently achieved promising progress in remote sensing object detection, they still face challenges when using a fixed query budget in scenes where object quantities vary significantly due to large-scale imaging and complex spatial distributions. A fixed query budget can be poorly matched to such variability, producing redundant queries in sparse scenes and many high-quality negatives under one-to-one matching, which may hinder optimization efficiency and degrade performance. To address this issue, we propose a transformer-based dynamic query aggregation detection framework (DQA-DETR) to alleviate query redundancy in remote sensing scenarios. DQA-DETR incorporates three key modules: aggregation center predictor, aggregation center selector, and query aggregator, which are responsible for predicting the required number of representative queries, selecting high-quality aggregation centers, and aggregating semantically related queries via a multihead attention mechanism, respectively. This strategy adaptively adjusts the number of queries involved in matching according to the object quantity in each image, effectively reducing redundancy and enhancing the model’s adaptability in both sparse and dense scenes. Extensive experiments on the DOTA v1.0 and v1.5 datasets demonstrate that DQA-DETR improves detection performance, while maintaining the end-to-end advantages of DETR, exhibiting strong competitiveness among state-of-the-art methods.
Yao et al. (Thu,) studied this question.