What question did this study set out to answer?

This research aims to systematically benchmark deep learning models for surface defect classification across multiple datasets.

March 22, 2026Open Access

Cross-Dataset Benchmarking of Deep Learning Models for Surface Defect Classification in Metal Parts

Key Points

This research aims to systematically benchmark deep learning models for surface defect classification across multiple datasets.
Benchmarking ten architectures including CNNs, Vision Transformers, and hybrid models.
Testing across five public defect datasets (NEU-DET, X-SDD, KolektorSDD2, DAGM, MTDD).
Implementing a unified evaluation pipeline for consistency in results.
Best performance achieved by Swin Transformer (mean F1-score 90.8%) and CoAtNet (85.5%).
EfficientNetV2 showed significantly lower performance (41.9%).
Lightweight models such as MobileNetV2, GhostNet, and SuperSimpleNet demonstrated competitive accuracy with lower resource requirements.

Abstract

Accurate surface defect classification is critical for industrial quality control. Although Deep Learning achieves strong results on individual datasets, most prior studies benchmark only a narrow set of models under inconsistent pipelines, limiting comparability and industrial relevance. This work introduces the first systematic benchmark of ten architectures—CNNs (CNN, ResNet18/50), lightweight models (MobileNetV2, SuperSimpleNet, GhostNet, EfficientNetV2), Vision Transformers (Swin Transformer), a hybrid CNN–Transformer (CoAtNet), and a one-stage detector (YOLOv12)—across five public defect datasets (NEU-DET, X-SDD, KolektorSDD2, DAGM, MTDD) under a unified pipeline. Results show that Swin Transformer and CoAtNet achieve the best performance (mean F1-scores 90.8% and 85.5%), while EfficientNetV2 underperformed (41.9%), underscoring the need for domain-specific benchmarks. Lightweight models such as MobileNetV2, GhostNet, and SuperSimpleNet deliver competitive accuracy at much lower cost, offering practical solutions for edge deployment. By bridging the gap between academic benchmarks and manufacturing requirements, this study provides actionable guidance for selecting defect detection models in automated inspection.

Cross-Dataset Benchmarking of Deep Learning Models for Surface Defect Classification in Metal Parts

Key Points

Abstract

Cite This Study