In industrial imaging scenarios, semiconductor wafer defect classification is crucial for chip manufacturing yield and reliability. However, numerous challenges persist, including weak imaging responses and detail loss during downsampling, complex backgrounds that interfere with feature extraction, and the trade-off between performance and efficiency on edge devices. Traditional CNNs and ViTs exhibit limitations in modeling long-range dependencies and managing edge deployment costs. To address these issues, we leverage the VMamba architecture, a Visual State Space Model (SSM) that achieves global contextual modeling with linear computational complexity. Based on the VMamba architecture, we propose FCS-VMamba, a domain-adapted model that integrates three core modules, namely Frequency Attention (FA), Cross-Layer Cross-Attention (CLCA), and Saliency Feature Suppression (SFS). The experimental results show that FCS-VMamba achieved 86.06% macro-precision and 87.91% Top-1 accuracy with only 1.2 M parameters. These results demonstrate that FCS-VMamba provides a practical and parameter-efficient baseline for industrial wafer defect recognition.
Zhang et al. (Tue,) studied this question.