What question did this study set out to answer?

The study aims to evaluate and compare various deep learning algorithms for the classification of roller bearing faults under different load and speed conditions.

April 13, 2026Open Access

Comparative analysis of deep learning algorithms for rolling element bearing fault classification under variable loads and speeds

Key Points

The study aims to evaluate and compare various deep learning algorithms for the classification of roller bearing faults under different load and speed conditions.
Conducted controlled tests to gather vibration data from roller bearings with various faults.
Evaluated four time-frequency techniques: STFT, CWT, WPT, and CQ-NSGT for feature visualization.
Proposed a hybrid transfer learning architecture integrating pre-trained CNNs with custom residual blocks.
Compared the proposed deep learning models against a baseline CNN and a Vision Transformer (ViT) using fivefold cross-validation.
Tested model performance against noise at varied signal-to-noise ratios (SNR).
The EfficientNetB0 model achieved the highest baseline accuracy of 99.83%.
Maintained 98.8% accuracy at an SNR of 1 dB, showcasing robustness.
MobileNetV2 demonstrated the shortest training time of 121.4 seconds with 0.50 GFLOPS efficiency.
ViT performed well but was less stable in noisy conditions compared to CNN models.

Abstract

This paper presents a comparative study of deep learning methods used to classify roller bearing faults. We obtained experimental vibration data by conducting controlled tests on roller bearings with various faults under different load conditions to simulate real-world industrial conditions. The methodology first evaluates four different time-frequency techniques-STFT, CWT, WPT and CQ-NSGT, establishing CQ-NSGT as the superior method for feature visualization. A novel hybrid transfer learning architecture that integrates pre-trained backbones (EfficientNetB0, MobileNetV2, InceptionV3) with custom residual blocks is proposed. These CNNs are compared to a baseline CNN and a Vision Transformer (ViT). Their performance was rigorously evaluated using fivefold cross-validation and tested against additive white Gaussian noise at Signal-to-Noise Ratios (SNR) of 1 dB, 3 dB, and 5 dB. The EfficientNetB0 hybrid model was able to reach the highest baseline accuracy of 99.83% while also exhibiting excellent robustness-maintaining 98.8% accuracy even at 1 dB SNR. MobileNetV2 is the most computationally efficient model with a training time of only 121.4 s and 0.50 GFLOPS making it perfect for edge deployment. ViT has potential but is less noise stable and lacks the inductive bias of CNNs which is important for this particular application. These results can be used as a guide for trading off accuracy of diagnosis and computational cost in industrial predictive maintenance.

Bookmark

View Full Paper

Cite This Study

Vishal et al. (Sat,) studied this question.

synapsesocial.com/papers/69dc87983afacbeac03e9eaa https://doi.org/https://doi.org/10.1038/s41598-026-42592-y

Bookmark

View Full Paper