In aerial imagery captured by drones, object detection tasks often face challenges such as a high proportion of small objects, complex background interference, and insufficient lighting conditions, all of which substantially affect feature representation and detection accuracy. To address these challenges, a novel object detection algorithm named channel attention and fine-grained enhancement YOLO (CAFE-YOLO) is proposed. This algorithm incorporates a channel attention mechanism into the backbone network to enhance the focus on critical features while suppressing redundant information. Furthermore, a fine-grained feature enhancement module is introduced to extract local detail features, improving the perception of small and occluded objects. In the detection head, a lightweight attention-guided feature fusion strategy is designed to further optimize object localization and classification performance. Experimental results on the VisDrone2019 dataset show that the proposed method achieves significantly better detection performance than most existing advanced algorithms in complex drone-captured imaging scenarios. While maintaining a lightweight architecture, it reaches a mean average precision at IoU threshold 0.5 of 44.6%, demonstrating substantial improvements in both overall detection accuracy and robustness.
Gao et al. (Wed,) studied this question.