Sustainable precision agriculture is crucial for optimizing resource utilization, reducing chemical inputs, and ensuring global food security. High-precision automatic recognition and monitoring of key crop organs (e.g., wheat heads and flower clusters) serve as the technological foundation for sustainable agricultural management decisions. However, visual perception in natural field environments is highly susceptible to external conditions. To address the challenges of severe background interference and feature dilution in crop small object detection within complex agricultural scenarios, this paper proposes an enhanced detection network, ACF-YOLO, based on YOLO11. First, an Aggregated Multi-scale Local-Global Attention (AMLGA) module is designed to enhance the feature representation of weak targets by fusing local details with global semantics. Second, a Context-Guided Fusion Module (CGFM) and a Soft-Neighbor Interpolation (SNI) strategy are introduced. Their synergy alleviates feature aliasing effects and ensures the precise alignment of deep semantic information with shallow spatial details. Furthermore, the Inner-MPDIoU loss function is employed to optimize the bounding box regression accuracy for non-rigid targets by incorporating geometric constraints and auxiliary scale factors. To verify the detection capability of the proposed method, we constructed a UAV Wheat Head Dataset (UWHD) and conducted extensive experiments on the UWHD, GWHD2021, and RFRB datasets. The experimental results demonstrate that ACF-YOLO outperforms other comparative methods, confirming its stable detection performance and contributing to the sustainable development of agriculture.
Li et al. (Wed,) studied this question.