A frequent problem with engineering constructions such as roads and buildings is surface cracks. The development of deep learning algorithms has significantly enhanced the ability to automatically detect surface cracks. Although recently developed transformer topologies may offer advantages, convolutional neural networks (CNNs) remain the most common method for this type of research. CNNs used as feature extractors can thoroughly explore the local connections of image blocks, which aid in enhancing detection performance, but fail to capture global dependencies within image blocks. The transformer’s ability to thoroughly examine global dependencies on sequential data has recently drawn attention. However, the transformer has a significant computational cost due to its attention mechanism. In this research, we construct a pyramid pooling–powered transformer backbone network for crack detection, utilizing several pooling techniques to generate feature maps with varying strides and receptive fields. The final pooled feature map is created by concatenating the output data from each pooling layer. The proposed architecture thus captures the features more robustly compared to the existing techniques, which enhances the crack detection accuracy. Systematic experiments demonstrate that the proposed model outperforms the chosen state‐of‐the‐art baselines in the pavement crack detection task in terms of precision, recall, and F‐measure.
Anoop et al. (Thu,) studied this question.