Abstract This study presents an advanced video analysis system for real-life violence detection, integrating transfer learning and image processing techniques. The system employs a pre-trained VGG16 model fine-tuned on a dataset of 8,252 augmented video frames, balanced between violent and non-violent samples. Extensive image preprocessing and augmentation were applied to enhance generalization. The model achieved 99.86% training accuracy and 98.36% validation accuracy, demonstrating robust performance in distinguishing violent from non-violent scenes. Key metrics including precision (0.98), recall (0.98), and F1-score (0.98) further validate the model's effectiveness. While the AUC-ROC score of 0.51 indicates room for improvement in class discrimination, overall results show promise for real-world applications. This approach offers potential for enhancing public safety through automated violence detection in surveillance systems. Future work will focus on expanding datasets, exploring multimodal inputs, and implementing real-time detection capabilities.
Saha et al. (Mon,) studied this question.