Utilizing video surveillance in mines to identify unsafe behaviors of miners is an important technical means for preventing coal mine accidents and achieving safety control. However, the complex underground environment (such as chaotic backgrounds, personnel occlusion, etc.) severely affects the estimation of human postures and feature extraction, resulting in low accuracy of unsafe behavior identification. To address this issue, this paper proposes a miner unsafe behavior recognition algorithm based on improved AlphaPose (RS-AlphaPose). Firstly, the improved real-time detection Transformer (RTDETR) is adopted to replace the original target detection network. Through the deformable attention mechanism and the addition of small target detection layers, the target detection ability in complex scenes is enhanced. Secondly, the sliding window attention and channel attention mechanisms are integrated in the posture estimation network to strengthen multi-scale semantics and global context correlation, thereby improving the accuracy of skeleton extraction in the presence of occlusion. Finally, the spatio-temporal graph convolution network is introduced to construct the spatio-temporal dependency of the skeleton sequence, capturing the temporal features of dynamic behaviors. On the COCO2017 posture dataset, the average accuracy of posture estimation of this algorithm reaches 72.5%, which is 2.2% higher than the basic AlphaPose model. On the self-built miner dynamic behavior dataset, the average recognition accuracy for typical unsafe behaviors such as climbing and crossing reaches 94.5%, which is 4.5% higher than the basic model. The experiments show that the proposed algorithm can effectively solve the interference problems in complex underground environments, significantly improve the accuracy of dynamic unsafe behavior recognition of miners, and provide a reliable technical solution for coal mine safety production.
Liu et al. (Sun,) studied this question.