As the service life of bridges increases, timely and accurate identification of cracks is essential for ensuring structural safety and durability. Traditional inspection methods often rely on 2D images, which lack reliable depth information for assessing the severity and progression of defects. 3D point cloud technology complements traditional 2D vision‐based bridge defect detection by providing depth information for spatial analysis, assessing defect severity and potential extension. Hence, to address the challenges of bridge defect inspection, we propose a scale‐adaptive two‐stage cross‐modal fusion method that integrates 3D point cloud data with 2D images for accurate spatial defect identification. This approach explicitly represents and integrates multisource knowledge, providing a scientifically grounded and reliable solution to bridge defect detection. It supports knowledge‐intensive engineering tasks by combining the advantages of 3D geometric information and 2D semantic cues, enabling better depth quantification and crack assessment. In the first stage, PointNet++ and a registration algorithm is first developed for 3D semantic segmentation and depth quantification of severe cracking defects. In the second stage, a physically consistent 3D–2D–3D cross‐modal fusion method is proposed to detect minor defects missed in previous step, converting point cloud into depth maps and performing semantic segmentation for depth quantification of smaller defects. A cracked concrete beam is used for method evaluation. Results show that the proposed method is robust different defect scales, with PointNet++ attaining a mIoU of 0.91 with depth over 2 cm on crack point cloud and YOLO11 reaching 85% accuracy on 2D depth maps for defects under 2 cm. Extracted crack depths showed a strong fit to the gamma distribution.
Xiong et al. (Thu,) studied this question.