Accurate surface defect classification is critical for industrial quality control. Although Deep Learning achieves strong results on individual datasets, most prior studies benchmark only a narrow set of models under inconsistent pipelines, limiting comparability and industrial relevance. This work introduces the first systematic benchmark of ten architectures—CNNs (CNN, ResNet18/50), lightweight models (MobileNetV2, SuperSimpleNet, GhostNet, EfficientNetV2), Vision Transformers (Swin Transformer), a hybrid CNN–Transformer (CoAtNet), and a one-stage detector (YOLOv12)—across five public defect datasets (NEU-DET, X-SDD, KolektorSDD2, DAGM, MTDD) under a unified pipeline. Results show that Swin Transformer and CoAtNet achieve the best performance (mean F1-scores 90.8% and 85.5%), while EfficientNetV2 underperformed (41.9%), underscoring the need for domain-specific benchmarks. Lightweight models such as MobileNetV2, GhostNet, and SuperSimpleNet deliver competitive accuracy at much lower cost, offering practical solutions for edge deployment. By bridging the gap between academic benchmarks and manufacturing requirements, this study provides actionable guidance for selecting defect detection models in automated inspection.
Silva et al. (Fri,) studied this question.