Key points are not available for this paper at this time.
The prevalence of violence has become increasingly widespread across most countries worldwide. Consequently, it is an important task to develop an effective system that can detect, alert, and prevent violence through video surveillance. In this study, we develop an automated system for detecting violent and non-violent incidents in video footage. Specifically, we introduce a method based on a combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) to identify violence or non-violence in videos by utilizing both image and motion features. The CNN model based on VGG19 architecture and with advanced recurrent neural network models using Convolutional Long Short-Term Memory (ConvLSTM) are employed. Our method employs CNN to extract meaningful representations from input images. These features are then fed into RNN to learn contextual information effectively. Experimental results show that our approach obtains promising results, with an accuracy of 97.96% on the Hockey dataset, 97.92% on the combined dataset of Hockey and Movies, and 96.9% on the combined dataset of Hockey, Movies, and Violent Flow.
Trinh et al. (Mon,) studied this question.