Rolling bearing fault diagnosis under complex and noisy operating conditions requires not only high diagnostic accuracy but also interpretability that can be quantitatively verified against physically meaningful excitation structures. However, many existing deep learning approaches rely on a single time–frequency (TF) representation and provide limited, non-verifiable links between model decisions and the original vibration patterns. To address this issue, we propose MBT-XAI, a multi-wavelet TF fusion network with a Token-to-Spectrum Traceback (TST) mechanism for structure-preserving, physics-consistent interpretability. Three complementary wavelets, namely Morlet, Mexican Hat, and Complex Morlet, are used to construct multi-view TF representations, which are encoded into RGB channels and adaptively fused via cross-channel attention within a Transformer backbone. TST maps patch-token attributions back to the TF domain, enabling quantitative evaluation of physics consistency through overlap-based metrics. Experiments on the public CWRU dataset and an industrial IMUST dataset show that MBT-XAI achieves 98.13 ± 0.24% and 96.23 ± 0.31% accuracy at SNR = 0 dB, outperforming the strongest baseline by 2.83% and 2.43%, respectively. Under AWGN contamination, MBT-XAI maintains 95.44 ± 0.38%/93.45 ± 0.47% accuracy on CWRU and 95.80 ± 0.33%/92.91 ± 0.51% accuracy on IMUST at SNR = −2/−4 dB. Under colored-noise contamination, the proposed method also preserves robust performance under pink and brown noise at the same SNR levels. Quantitative interpretability evaluation further indicates high alignment between salient frequency regions and theoretical fault-characteristic bands, with IoU = 80.21 ± 0.86% and Coverage = 91.70 ± 0.63%. In addition, MBT-XAI requires 10.393 M parameters and 10.678 GFLOPs, with an inference latency of 14.7 ms per sample (batch size = 1) on an NVIDIA GeForce RTX 3060 GPU. These results suggest that multi-wavelet TF modeling with attention-based fusion and TF-level traceback provides an accurate, robust, and physics-consistent framework for intelligent bearing fault diagnosis.
Fan et al. (Wed,) studied this question.