Key points are not available for this paper at this time.
The utilization of multi-source sensing data to achieve intelligent perception and refined management of farmland has become a vital research direction in modern agriculture. However, traditional inspection approaches based solely on visual information are highly susceptible to illumination variations, occlusion, and background interference, which makes stable pest detection and accurate crop growth assessment difficult to achieve. To address these problems, we propose a multimodal target perception network for intelligent farmland inspection. By integrating UAV imagery, ground environmental sensor data, and spatial location information, joint perception of farmland pests, diseases, and crop growth status is achieved. In the proposed framework, cross-modal alignment and collaborative encoding mechanisms, a multi-scale target perception structure, and a dynamic multimodal fusion strategy are introduced to collaboratively model information within a unified semantic space. Experimental results on a constructed multimodal farmland dataset demonstrate that the proposed method achieved 87.53% Precision and 89.16% mAP in the pest and disease detection task, and 88.04% Accuracy in the crop growth assessment task, significantly outperforming several mainstream visual detection models and multimodal fusion approaches. The results indicate that this intelligent perception framework can significantly improve the robustness of farmland inspection systems, providing an effective technical pathway for AI-driven precision agriculture decision-making. This technology breaks the barrier between production-side sensing data and e-commerce demand, providing a practical technical solution for agricultural production-marketing synergy, quality premium realization and digital rural revitalization.
Li et al. (Sat,) studied this question.