Wafer defect analysis is important for semiconductor manufacturing, but labeled data are limited, and class distributions are highly imbalanced. We present a semi-supervised framework with two lightweight hybrid CNN–Transformer models for wafer defect classification and segmentation. For classification, HybridCNN-ViT combines CNN-based local feature extraction with Transformer-based global context modeling, and adopts a three-stage progressive pseudo-labeling strategy to leverage unlabeled samples. The pseudo-label selection mechanism is systematically calibrated to improve pseudo-label reliability under limited labeled data. For segmentation, ConvoFormer-UNet integrates convolution-enhanced embeddings with Transformer blocks to balance boundary detail and global context. On the public WM-811K dataset, HybridCNN-ViT achieves 98.72% accuracy and 0.9985 macro-AUC under the semi-supervised setting for classification, while ConvoFormer-UNet reaches 99.19% IoU for segmentation with fewer parameters than several baselines. We also report efficiency on a single GPU to illustrate practical inference speed.
Shi et al. (Mon,) studied this question.