Remote sensing objects often exhibit significant scale variations, high aspect ratios, and diverse orientations. The anisotropic spatial distribution of such objects’ features leads to the conflict between feature representation and boundary regression caused by the coupling of different attribute parameters: previous detection methods based on square-kernel convolution lack the overall perception of large-scale or slender objects due to the limited receptive field; if the receptive field is simply expanded, although more context information can be captured to help object perception, a large amount of background noise will be introduced, resulting in inaccurate feature extraction of remote sensing objects. Additionally, the extracted features face issues of feature conflict and discontinuous loss during parameter regression. Existing methods often neglect the holistic optimization of these aspects. To address these challenges, this paper proposes SODE-Net as a systematic solution. Specifically, we first design a multi-scale fusion and spatially orthogonal convolution (MSSO) module in the backbone network. Its multiple shapes of receptive fields can naturally capture the long-range dependence of the object without introducing too much background noise, thereby extracting more accurate target features. Secondly, we design a multi-level decoupled detection head, which decouples target classification, bounding-box position regression and bounding-box angle regression into three subtasks, effectively avoiding the coupling problem in parameter regression. At the same time, the phase-continuous encoding module is used in the angle regression branch, which converts the periodic angle value into a continuous cosine value, thus ensuring the stability of the loss value. Extensive experiments demonstrate that, compared to existing detection networks, our method achieves superior performance on four widely used remote sensing object datasets: DOTAv1.0, HRSC2016, UCAS-AOD, and DIOR-R.
Yu et al. (Mon,) studied this question.