Key points are not available for this paper at this time.
In industrial environments such as construction sites and factories, video surveillance plays a vital role in safety assurance. Ensuring compliance with head protective equipment (HPE) regulations is particularly critical because it significantly reduces the risk of accidents, injuries, and property damage. However, conventional HPE detection methods, predominantly based on analyzing single-frame images, fail to leverage the valuable temporal information inherent in video streams. To address this limitation, we propose a novel framework designed to enhance detection by integrating hierarchical spatial features with augmented temporal cues, which is termed Surveillance Object Detection via Spatiotemporal Information Network (SST-Net). Furthermore, acknowledging the scarcity of annotated industrial video datasets suitable for training spatiotemporal models, we introduce an innovative strategy to generate pseudo spatiotemporal information. This strategy enables SST-Net to be effectively trained using readily available image datasets while retaining the capacity for temporal reasoning during inference. To rigorously evaluate SST-Net’s performance, we developed and publicly released two large-scale, high-quality HPE datasets. Comprehensive experiments demonstrate that SST-Net achieves a state-of-the-art average precision (AP) of 89.5%, surpassing existing leading models. This significant improvement underscores the effectiveness of our approach in harnessing spatiotemporal information for robust and accurate HPE detection in industrial surveillance scenarios.
Liu et al. (Fri,) studied this question.