Public space monitoring systems are critical for observing typical human behavior and detecting abnormal activities, especially in high-security environments. With the rise in public space thefts, there is a growing need for intelligent systems capable of detecting suspicious movements early enough to prevent criminal acts. Although Convolutional Neural Networks (CNNs) are widely used in image classification, they are inadequate to differentiate between abnormal and normal behavior and identify criminal activity in its early stage. To overcome these limitations, this study proposes a new hybrid model that combines Mask R-CNN (MRCNN) with Long Short-Term Memory (LSTM) networks for accurate object detection, tracking, and sequential behavior analysis. The main contribution of this study is a multistage anomaly detection pipeline that involves frame conversion, contrast enhancement, background removal, object tracking, and feature extraction. The MRCNN-LSTM framework can extract both spatial and temporal characteristics to allow precise early-stage anomaly detection. Thorough testing on three benchmarking datasets, UCF Crime, Snatch1.0, and CUHK, exhibited excellent performance, with a 93.6% accuracy for the UCF Crime dataset. Performance metrics such as observation ratio and time duration were used to assess the responsiveness and effectiveness of the system in real-time surveillance scenarios. This research advances the field of intelligent surveillance by enabling proactive threat mitigation through the early and precise detection of anomalous behavior.
Manju et al. (Sat,) studied this question.