Abstract Road defect detection is essential to road maintenance and road safety, but current approaches barely achieve the desired accuracy and real-time processing. This work introduces a novel hybrid deep learning architecture that leverages EfficientNetV2-B0 together with Convolutional Block Attention Module (CBAM) in order to achieve high-precision, real-time multi-class road defect detection. The system leverages EfficientNetV2-B0's strong feature extraction and complements it with CBAM's attention mechanism for focusing on important defect regions to improve detection accuracy while maintaining computation efficiency. We tested the system on a well-chosen dataset that contains 1200 images for four classes of defects (cracks, potholes, patches, and surface defects), with better performance at 97% accuracy and 21ms inference per image using GPU hardware. Comparative experiments show our hybrid approach outperforms individual CNNs (EfficientNetV2-B0: 93.5%) and Vision Transformers (ViT-Tiny: 97.1% but 70ms latency) in speed-accuracy trade-offs. The high performance of the system is further augmented by the 6th October City- Giza- Egypt case study in which it precisely recognized and classified important pavement distresses in real urban environments, such as fine cracks (98% accuracy), hazardous potholes (96% recall), and complex surface defects (97% precision). The suggested system has a high degree of technical advantage for infrastructure monitoring applications, with real-time processing capabilities (21.5ms per image) and low computational overheads (1.42 billion FLOPs). This work encourages automated monitoring of infrastructure by providing a scalable, high-accuracy, and low-latency solution for road defect detection.
Ezz et al. (Mon,) studied this question.