Key points are not available for this paper at this time.
Reliable bearing fault diagnosis plays an important role in maintaining the safety and performance of rotating machinery in industrial systems. Although deep learning models have achieved remarkable success in this field, their dependence on a single feature-extraction approach often restricts the diversity of learned representations and limits diagnostic accuracy. To overcome this limitation, this study proposes an attention-guided dual-path framework that integrates spatial and time–frequency feature learning with transformer-based classification for precise fault identification. In the proposed framework, vibration signals collected from an experimental bearing test rig are simultaneously processed through two complementary pipelines: one converts the signals into two-dimensional matrix images to extract spatial features, while the other transforms them into continuous wavelet transform (CWT) scalograms to capture fine-grained temporal and spectral information. The extracted features are fused through a lightweight transformer encoder with an attention mechanism that dynamically emphasizes the most informative representations. This fusion enables the model to effectively capture cross-domain dependencies and enhance discriminative capability. Experimental validation on an industrial vibration dataset demonstrates that the proposed model achieves 99.87% classification accuracy, outperforming conventional CNN and transformer-based approaches. The results confirm that integrating multi-domain features with attention-driven fusion significantly improves the robustness and generalization of deep learning models for intelligent bearing fault diagnosis.
Saif Ullah (Sun,) studied this question.