Open-vocabulary object detection in remote sensing aims to detect novel categories not seen during training, which is crucial for practical aerial image analysis applications. While some approaches accomplish this task through large-scale data construction, such methods incur substantial annotation and computational costs. In contrast, we focus on efficient utilization of limited datasets. However, existing methods such as CastDet struggle with inefficient data utilization and class imbalance issues in pseudo-label generation for novel categories. We propose an enhanced open-vocabulary detection framework that addresses these limitations through two key innovations. First, we introduce a selective masking strategy that enables direct utilization of partially annotated images by masking base category regions in teacher model inputs. This approach eliminates the need for strict data separation and significantly improves data efficiency. Second, we develop a dynamic frequency-based class weighting that automatically adjusts category weights based on real-time pseudo-label statistics to mitigate class imbalance issues. Our approach integrates these components into a student–teacher learning framework with RemoteCLIP for novel category classification. Comprehensive experiments demonstrate significant improvements on both datasets: on VisDroneZSD, we achieve 42.7% overall mAP and 41.4% harmonic mean, substantially outperforming existing methods. On DIOR dataset, our method achieves 63.7% overall mAP with 49.5% harmonic mean. Our framework achieves more balanced performance between base and novel categories, providing a practical and data-efficient solution for open-vocabulary aerial object detection.
Building similarity graph...
Analyzing shared references across papers
Loading...
Siyuan Wang
Yu Song
Jianwei Xiang
Remote Sensing
National University of Defense Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68e9b1d0ba7d64b6fc132cac — DOI: https://doi.org/10.3390/rs17193385