Accurate assessment of cotton emergence rates is essential for precision agriculture management, and unmanned aerial vehicle (UAV) imagery provides a scalable means for field-level monitoring. However, cotton seedling detection from UAV images faces persistent challenges: individual seedlings appear as small targets with diverse morphologies across varying flight altitudes; strong plastic film reflections, weeds, and soil cracks introduce substantial background interference; and “missing seedling” targets, which manifest as negative space features, exhibit high similarity to background noise. Existing CNN–Transformer hybrid detection architectures are limited by fixed convolutional receptive fields that cannot adapt to multi-scale target variations, attention mechanisms that lack explicit directional geometric modeling, and interpolation-based upsampling that attenuates high-frequency edge details of small targets. To address these issues, this paper proposes DDF-DETR (Dynamic-Direction-Frequency Detection Transformer), a multi-scale spatial context detection method based on RT-DETR. The method incorporates three components: a Dynamic Gated Mixer Block (DGMB) for adaptive multi-scale feature extraction with background noise suppression, a Direction-Aware Adaptive Transformer Encoder (DAATE) for directional geometric feature modeling at linear computational complexity, and a Frequency-Aware Sub-pixel Upsampling Network (FASN) for high-frequency detail recovery in the feature pyramid. On the self-constructed Xinjiang cotton field dataset, DDF-DETR achieves 83.72% mAP@0.5 and 63.46% mAP@0.5:0.95, representing improvements of 2.38% and 5.28% over the baseline RT-DETR-R18, while reducing the parameter count by 30.6% and computational cost to 42.8 GFLOPs. Generalization experiments on the VisDrone2019 and TinyPerson datasets further validate the robustness of the proposed method for small target detection across different scenarios.
Xu et al. (Sat,) studied this question.