Automated road damage detection using Unmanned Aerial Vehicle (UAV) imagery is technically constrained by small target dimensions and complex environmental backgrounds. To address these issues within a low-computational budget, this study proposes YOLO-SCX, a computationally efficient object detection architecture based on the YOLOv5n baseline. The methodological novelty of this work lies in the systematic integration of three structural optimizations designed for aerial sensing: (1) the Convolutional Block Attention Module (CBAM) to suppress background noise; (2) a Grouped SPPCSPC module to strengthen multi-scale feature fusion; and (3) a Decoupled Head to independently optimize classification and regression tasks. The research utilizes a composite dataset of 1,500 images derived from UAV-RDD and CrackForest sources, rigorously partitioned into training (1,000), validation (250), and testing (250) sets. Experimental results on the held-out test set demonstrate that YOLO-SCX achieves a mean Average Precision (mAP@0.5) of 66.3% and Precision of 79.2%, representing absolute improvements of 5.8% and 6.0% respectively over the baseline. Furthermore, the model maintains an inference speed of 185 FPS with 8.7 million parameters, confirming its suitability for real-time edge deployment compared to heavier architectures like YOLOv7 and YOLOv8.
Yang et al. (Fri,) studied this question.