Abstract To address the challenges of target occlusion and background clutter that result in missed and false detections during safety helmet detection on construction sites, this article proposes a novel ISE-YOLOv11 (Improved Squeeze-and-Excitation based YOLOv11) model. Firstly, a C3K2ₐ deformable convolution module is designed to capture diverse features of the target and reduce missed detections. To achieve rapid target localization, an improved attention mechanism ISE (Improved Squeeze-and-Excitation) is introduced to replace the classical attention module in YOLOv11. This enhancement enables more accurate target positioning, extracts richer edge and texture features, and reduces the number of parameters simultaneously. Furthermore, a cross-scale feature fusion module termed MDFF (Multi-scale Deformable Feature Fusion) is constructed to significantly improve the model's perception capability of targets in occluded backgrounds and enhance detection accuracy. Experimental results show that on the SHWD (Safety Helmet Wearing) dataset, the improved model maintains real-time performance at 38 FPS. The improved model maintains real-time performance while achieving recall and accuracy of 90. 2% and 92. 4%, respectively, which is 0. 6% and 1. 5% higher than the original YOLOv11 model. In addition, ISE-YOLOv11 exhibits superior detection performance across various scenarios, demonstrating better adaptability to complex and diverse environments. It provides an efficient and reliable technical solution for safety supervision on construction sites and offers new insights into the integration of attention mechanisms with single-stage detectors.
DU et al. (Mon,) studied this question.