With the rapid development of neural networks, strip steel surface defect detection, as an important task in computer vision, has achieved remarkable progress. However, state-of-the-art methods still face a tradeoff between accuracy and efficiency. High-performing models are usually large and computationally expensive, whereas lightweight models often suffer from limited detection accuracy. To address this issue, we first propose a spatial channel enhancement (SCE) module, which consists of a reparameterizable depthwise large-kernel convolution and a reparameterizable pointwise (RepPw) convolution. The proposed SCE module enlarges the receptive field and strengthens long-range spatial and channel interactions while preserving computational efficiency. Based on the SCE module, we propose a novel lightweight saliency model for strip steel surface defects, namely, reparameterization-driven depthwise separable large-kernel network (RepDSLKNet). RepDSLKNet employs SCE modules to build an encoder and a decoder, and utilizes cascaded channel attention (CCA) modules for the feature fusion. The lightweight architecture can effectively extract and fuse the semantic information and detailed features of strip steel surface defects, thereby improving the accuracy and speed of detection with a small model size. With an input size of 224 224, our RepDSLKNet has only 0. 47 M parameters and 0. 42 G FLOPs during inference. Compared to the current state-of-the-art methods, our approach achieves a 19-fold improvement in throughput and a twofold reduction in latency. Experiments on two public strip steel defect datasets demonstrate that RepDSLKNet delivers competitive performance against state-of-the-art methods.
Zhou et al. (Thu,) studied this question.