What question did this study set out to answer?

The aim is to develop a diagnostic framework that enhances the recognition of tomato leaf diseases by integrating multimodal data from visual and textual sources.

February 14, 2026Open Access

LPDiag: LLM-Enhanced Multimodal Prototype Learning Framework for Intelligent Tomato Leaf Disease Diagnosis

Puntos clave

The aim is to develop a diagnostic framework that enhances the recognition of tomato leaf diseases by integrating multimodal data from visual and textual sources.
Developed a multimodal prototype-attention framework called LPDiag.
Utilized large language models (LLMs) for semantic understanding of disease descriptions.
Extracted multi-scale visual features using an enhanced Res2Net backbone.
Integrated knowledge-enhanced attention mechanisms for better interaction between textual and visual data.
Created an interactive diagnostic system for natural language queries and image-based identification.
Achieved a mean accuracy of 98.83% on three datasets.
Outperformed state-of-the-art models in tomato disease identification.
Provided improved interpretability and explanatory capability of the model.

Resumen

Tomato leaf diseases exhibit subtle inter-class differences and substantial intra-class variability, making accurate identification challenging for conventional deep learning models, especially under real-world conditions with diverse lighting, occlusion, and growth stages. Moreover, most existing approaches rely solely on visual features and lack the ability to incorporate semantic descriptions or expert knowledge, limiting their robustness and interpretability. To address these issues, we propose LPDiag, a multimodal prototype-attention diagnostic framework that integrates large language models (LLMs) for fine-grained recognition of tomato diseases. The framework first employs an LLM-driven semantic understanding module to encode symptom-aware textual embeddings from disease descriptions. These embeddings are then aligned with multi-scale visual features extracted by an enhanced Res2Net backbone, enabling cross-modal representation learning. A set of learnable prototype vectors, combined with a knowledge-enhanced attention mechanism, further strengthens the interaction between visual patterns and LLM prior knowledge, resulting in more discriminative and interpretable representations. Additionally, we develop an interactive diagnostic system that supports natural-language querying and image-based identification, facilitating practical deployment in heterogeneous agricultural environments. Extensive experiments on three widely used datasets demonstrate that LPDiag achieves a mean accuracy of 98.83%, outperforming state-of-the-art models while offering improved explanatory capability. The proposed framework offers a promising direction for integrating LLM-based semantic reasoning with visual perception to enhance intelligent and trustworthy plant disease diagnostics.

Me gusta

Guardar

Ver artículo completo

Cite This Study

Dong et al. (Thu,) studied this question.

synapsesocial.com/papers/699011812ccff479cfe5832e https://doi.org/https://doi.org/10.3390/agriculture16040419

Me gusta

Guardar

Ver artículo completo