Key points are not available for this paper at this time.
Abstract Due to the increasing incidents of crime and violence, it is important to develop technology to automatically detect the presence of violence in security camera images. Although law enforcement agencies have sufficient images, they do not have the human resources to analyze them and detect violence in a timely manner. Having an unmanned system that can analyze violent incidents can play an important role in ensuring security by analyzing violent incidents for law enforcement agencies. In these video footage, there are examples of false recognitions that are labeled as normal in some frames while the abnormal event continues. Better trained models and different methods have been proposed to overcome these common difficulties in crowd analysis. In this study, we aim to identify the start and end frames of the event with minimum error in indoor or outdoor violent camera images. For this purpose, firstly, a model is created to enrich the sequential video frames containing violence by using mixup data augmentation method for limited training datasets, so that the system can learn more features and thus increase the training performance. Secondly, with another proposed method, more effective video analysis is realized by filtering the frames containing false positives in the outputs obtained from a deep learning based system. Experimental results show that the proposed method achieves a remarkable success rate of approximately 99.3%. In this way, false positives are significantly reduced, and the start, end and action times of violent events that continue for more than one second in consecutive frames can be accurately detected.
Kutlugün et al. (Wed,) studied this question.