This paper reports on the empirical performance of few-shot learning (FSL) for visual defect classification using confidential industrial datasets. We evaluate 16 combinations of four backbone models (Perception Encoder, DINOv2, DINOv3, ConvNeXt-v2) and four FSL classifiers (Prototypical Networks, Neighborhood Component Analysis, Relation Networks, Linear Adapter). The evaluation covers three conditions: a baseline comparison, deterministic support set augmentation, and a learnable attention preprocessor. Results demonstrate that support set augmentation is a highly effective strategy, improving performance in nearly all configurations. Furthermore, the DINOv2 and ConvNeXt-V2-T backbones emerged as top performers, achieving the most competitive and highest-accuracy results, respectively. These findings suggest that for industrial FSL applications, combining a strong, pre-trained backbone with a simple augmentation strategy is a practical approach for building data-efficient classification systems.
Molek et al. (Thu,) studied this question.