ABSTRACT: Recent advancements in earth vision object detection highlight challenges in tiny object detection, primarily due to class imbalance between foreground and background, inadequate semantic signals, and limited pixel information. Current object detectors struggle with small objects due to the lack of discriminative feature supervision, leading to suboptimal results. In aerial object detection tasks, research utilizing anchor-based two-stage detectors has significantly improved performance, leading to divergence in object features and impacting network learning. This study presents the Feature Enhanced Attention Module (FEAM), Anchor Adaption Region Proposal Network Head (A2RPH), and Stacked Sparse Autoencoder (SSAE). High-level features are unsupervised learnt by the SSAE from unlabelled aerial images. In order to increase the discriminability of learnt features, supervised learning is also applied to refine the feature representation. To improve the model, a logistic regression classifier is fed these high-level features. In particular, A2RPH enables better positive and negative sample assignments in the Region Proposal Network (RPN) by performing anchor adaptive learning by creating a new anchor bias learning branch from the feature map. In order to achieve better feature representation, FEAM presents Gaussian mask supervision for attention and introduces global features and mask attention based on FPN. INDEX TERMS: Stacked Sparse Autoencoder, Feature Enhanced Attention Module (FEAM), aerial images, anchor adaption, and tiny object detection.
Rubadevi et al. (Fri,) studied this question.