• Millions of vibration signals are obtained from video pixels via PME for damage detection. • The phase-wrapping issue in PME signals is addressed by the proposed Unwrapformer. • Structural damage detection accuracy is considerably improved by the unwrapped signals. In recent years, deep neural networks have been used increasingly to process vibration signals for detecting structural damage. However, the effectiveness of deep neural networks relies on having many vibration signals, which are difficult to acquire in practical engineering measurement scenarios. To address this limitation, a phase-based motion estimation (PME) technique has been introduced, which treats each pixel as an independent sensor, enabling the extraction of millions of vibration signals from a single video. Despite its advantages, all phase-based methods are fundamentally limited by the issue of phase wrapping—a consequence of the 2π periodicity in phase representation—which introduces substantial displacement errors and severely degrades detection accuracy. To overcome this challenge, this paper proposes a novel Transformer-based deep learning framework for structural damage detection. The proposed model integrates a dedicated phase-unwrapping module that computes wrap counts from distorted pixel-level vibration signals and reconstructs the original vibration responses, leading to a substantial improvement in detection precision. More importantly, the model can be trained without requiring densely labeled, time-step-aligned ground truth vibration data. Instead, it identifies phase wrapping events by modeling the inherent dependencies present in the vibration signals, thereby establishing a highly weakly-supervised learning paradigm that greatly reduces data requirements. The effectiveness of the proposed method was validated by successfully detecting subtle bolt looseness in a steel structure, showcasing its superior performance in structural damage detection tasks.
Wang et al. (Sun,) studied this question.