Key points are not available for this paper at this time.
Automated monitoring of safety helmets at construction sites is essential for injury prevention and ensuring safety compliance. Although object detection methods have been extensively used in prior research, direct comparisons remain challenging due to data set variations, many of which are not publicly accessible. Additionally, enhancing detection accuracy and computational efficiency remains necessary for practical real-time monitoring. To overcome these limitations, this study evaluates you only look once (YOLOv10) models for classifying safety helmets and nonsafety helmets from images collected via surveillance and body-worn cameras. It benchmarks convolutional neural networks-based backbones against transformer-based architectures, including vision transformer (ViT), Swin transformer, pyramid vision transformer, MobileViT, and axial transformer within the YOLOv10 framework. Among these, the Swin transformer demonstrated superior performance, achieving the highest AP50 scores. Specifically, for surveillance images, it attained a mean AP (mAP) of 94.24%, with AP50 of 96.55% for safety helmets and 91.92% for nonsafety helmets. For body-worn camera images, it achieved a mAP of 90.86%, with AP50 of 93.25% for safety helmets and 88.47% for nonsafety helmets. Validation on four benchmark data sets further confirmed its reliability. The study concludes by discussing practical applications, limitations, and future enhancement potential of the proposed YOLOv10-based approach.
Wang et al. (Thu,) studied this question.