What question did this study set out to answer?

The research aims to provide a comprehensive dataset for plant disease segmentation to enhance detection and localization.

February 11, 2026Open Access

A Large-Scale In-the-wild Dataset for Plant Disease Segmentation

Key Points

The research aims to provide a comprehensive dataset for plant disease segmentation to enhance detection and localization.
Introduced PlantSeg, a large-scale segmentation dataset with 7,774 images across 115 disease categories.
Provided pixel-level annotation masks for precise disease localization.
Collected in-the-wild images to represent real-world conditions effectively.
Demonstrated improved performance of segmentation models on real-world plant disease images.
Showcased that models trained on PlantSeg outperform those trained on smaller or laboratory datasets.

Abstract

Plant diseases impose serious threats to agricultural productivity and can significantly impact crop yields and quality 1 . Globally, an estimated 20−40% of crop yield is lost due to plant diseases. According to the Food and Agriculture Organization of the United Nations 2 , the annual losses exceed 220 billion dollars. Hence, accurate plant disease detection and assessment at the early stage play a crucial role in minimizing economic losses. Traditionally, manual diagnosis is considered as the most reliable method of assessment. However, plant pathologists might not always be available to carry out assessments in a timely manner. Moreover, since they specialize in recognizing only several plant diseases, reliable diagnoses often require consulting multiple plant pathologists. Automatic plant disease localization and segmentation is important in precision agriculture 3 . Generic image segmentation methods 4 , 5 , 6 , 7 have demonstrated outstanding performance on natural image datasets, such as ADE20k 8 , Cityscapes 9 and MSCOCO 10 . In contrast, segmenting plant diseases remains challenging, as the symptoms observed in images can be subtle and diverse, such as small spots, discoloration, and minor textural changes. Furthermore, due to the lack of large-scale plant disease segmentation datasets 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , existing generic segmentation methods struggle to address this challenge. To illustrate, we provide the statistics for existing representative plant disease datasets in Table 1 . We observe that existing datasets commonly suffer from three limitations: annotation types, image sources, and the scale of datasets. In this paper, we present a large-scale in-the-wild plant disease segmentation dataset, named PlantSeg, to better support practical disease detection and localization applications. Specifically, we address shortfalls of existing datasets in the following three aspects: Annotation Types . Unlike most of the existing plant disease datasets that only contain classification labels or object detection bounding boxes, PlantSeg provides pixel-level annotation masks. The classification labels only provide image-level disease information, and detection bounding boxes provide coarse-grained locations of plant diseases. Instead, PlantSeg provides segmentation masks to pinpoint the precise and fine-grained locations of diseased areas. Note that annotations are carried out under the supervision of experienced plant experts. Image Sources . PlantSeg is composed of in-the-wild images, whereas most existing datasets 15 , 16 , 19 , 20 only consist of images collected under controlled laboratory conditions. We illustrate the differences between laboratory and in-the-wild images in Fig. 1 . PlantSeg contains entire plants and varying lighting conditions, occlusions, and complex backgrounds, thus better representing real-world disease detection and segmentation scenarios and more suitable for training models for practical applications. Examples of images of PlantVillage 19 and our PlantSeg dataset. Each image from PlantVillage only contains one leaf and has a uniform background, while images of our dataset feature much more complex backgrounds, various viewpoints, and different lighting conditions. Dataset Scale . PlantSeg surpasses existing datasets 21 , 22 in terms of the number of images, plant types and disease classes. Considering the large variability of plant diseases arising from morphological changes in abnormal growth, appearance, or development, small-scale datasets usually fail to capture the diversity of real-world diseases. In contrast, PlantSeg addresses this issue by introducing 7,774 images across 115 disease categories, thereby better representing the diversity of disease conditions. In this paper, we showcase the characteristics of PlantSeg and benchmark state-of-the-art segmentation models on plant disease segmentation, demonstrating that our dataset is a comprehensive benchmark for developing plant disease segmentation methods. Benefiting from our newly curated dataset, we are able to train segmentation models that achieve promising generalizable performance on real-world plant disease images. As a result, we effectively tackle the inferior generalization issues of models trained on laboratory datasets or small-scale real-world images. The trained plant disease segmentation models can be applied in automated precision agriculture systems, such as quarantining affected areas within paddocks and adjusting fungicide application rates to minimize the spread of diseases.

A Large-Scale In-the-wild Dataset for Plant Disease Segmentation

Key Points

Abstract

Cite This Study