Abstract Self-supervised learning (SSL) has emerged as a pivotal paradigm in computer vision, particularly for applications where labeled data are limited, costly, or labor-intensive to obtain. This paradigm is especially relevant to sustainable agricultural systems, where large-scale manual annotation is impractical, and resource efficiency is a key objective. In agricultural vision tasks such as potato disease detection, quality assessment, and automated inspection, SSL has demonstrated strong potential to address the substantial visual variability encountered under real-world field conditions. Among recent SSL approaches, the Distillation with No Labels (DINO) framework, built upon Vision Transformers (ViTs) and multi-crop data augmentation, has shown remarkable capability in learning robust and transferable feature representations. In this study, we present a comprehensive analytical review of the DINO framework within the context of sustainable automated potato inspection systems in the Al Kharj region. The analysis systematically investigates architectural design choices, critical hyperparameters, teacher–student distillation dynamics, optimization behavior, and the mathematical mechanisms employed to prevent representation collapse. Using a potato leaf disease dataset, the DINO-ViT model is evaluated under rigorous experimental conditions. The results demonstrate that DINO-ViT effectively learns compact and semantically meaningful feature representations with clear inter-class separability, as evidenced by principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) visualizations, and quantitative class-separation metrics. Furthermore, when compared with conventional machine learning approaches, including Random Forest, Support Vector Machines, Gradient Boosting, and k-Nearest Neighbors, the DINO-ViT model achieves superior performance, attaining an accuracy of 0.8344 and an F1-score of 0.8334.
Tarek et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: