Weed–crop object detection in UAV field imagery faces several significant challenges, including a large proportion of small objects, dense occlusions, similar texture appearance, and strong background interference. These challenges often lead to missed detections, localization drift, and unstable training under edge-device budget constraints. To improve detection accuracy while maintaining a practical accuracy–efficiency trade-off in complex farmland scenes, we propose the Dual-Driven Texture–Semantic Fusion Network (D2FNet), consisting of a Texture–Semantic Backbone (TSB), an efficient operator MCF-A2C2f, a cross-scale adaptive fusion and feature redistribution module DSSA-Head, and a scale-aware reweighting block PSBL. TSB reduces discriminative ambiguity caused by similar weed–crop appearance and complex background textures; MCF-A2C2f controls the additional cost of the dual-driven design via lightweight operator substitution while largely preserving per-scale representations; DSSA-Head addresses multi-scale representation inconsistency induced by abundant small objects and large scale variation in field scenes; PSBL downweights low-quality positives by sample quality to stabilize box regression and training. Experimental results show that on the WeedCrop Image Dataset, D2FNet-n improves mAP50--95 from 36.6% to 44.1% (+7.5%) over the baseline YOLOv12-n; on the auxiliary Sesame Crop & Weed Dataset, mAP50--95 increases from 62.2% to 70.1% (+7.9%). These results indicate that D2FNet achieves stable accuracy gains under comparable parameter and computation budgets, rather than pursuing the smallest absolute model size, and shows promising cross-dataset robustness on the evaluated benchmarks.
Zhu et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: