Ensuring the safety of public spaces is a critical responsibility for law enforcement agencies. Traditional closed-circuit television (CCTV) surveillance demands significant workforce resources, and operators must monitor multiple camera feeds simultaneously. While computer vision techniques have been introduced to assist in anomaly detection, they typically require extensive training data and lack generalizability – each camera scene is unique and thus necessitates a custom-trained model to achieve robust performance. Anomaly detection remains a challenging research area, largely due to the scarcity of high-quality datasets. Existing computer vision datasets often consist of short videos with few people and limited types of anomalies. Repurposing trained models to accommodate new anomaly instances is therefore tedious and costly. To address these limitations, we propose a robust, unsupervised approach that adapts automatically to each camera scene without retraining.
Sormunen et al. (Thu,) studied this question.