This paper presents a comparative study of deep learning methods used to classify roller bearing faults. We obtained experimental vibration data by conducting controlled tests on roller bearings with various faults under different load conditions to simulate real-world industrial conditions. The methodology first evaluates four different time-frequency techniques-STFT, CWT, WPT and CQ-NSGT, establishing CQ-NSGT as the superior method for feature visualization. A novel hybrid transfer learning architecture that integrates pre-trained backbones (EfficientNetB0, MobileNetV2, InceptionV3) with custom residual blocks is proposed. These CNNs are compared to a baseline CNN and a Vision Transformer (ViT). Their performance was rigorously evaluated using fivefold cross-validation and tested against additive white Gaussian noise at Signal-to-Noise Ratios (SNR) of 1 dB, 3 dB, and 5 dB. The EfficientNetB0 hybrid model was able to reach the highest baseline accuracy of 99.83% while also exhibiting excellent robustness-maintaining 98.8% accuracy even at 1 dB SNR. MobileNetV2 is the most computationally efficient model with a training time of only 121.4 s and 0.50 GFLOPS making it perfect for edge deployment. ViT has potential but is less noise stable and lacks the inductive bias of CNNs which is important for this particular application. These results can be used as a guide for trading off accuracy of diagnosis and computational cost in industrial predictive maintenance.
Vishal et al. (Sat,) studied this question.