What does this research mean for the field?

A hybrid ResNet and Bidirectional LSTM spatio-temporal framework enables reliable, real-time CCTV anomaly detection on standard hardware when deployed as a full-stack web application. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to develop an efficient real-time framework for detecting anomalies in CCTV surveillance videos.

June 3, 2026Open Access

Real-Time CCTV Anomaly Detection Using a Hybrid ResNet + Bidirectional LSTM Spatio-Temporal Framework: A Deployable Web-Based Surveillance System

Key Points

The aim is to develop an efficient real-time framework for detecting anomalies in CCTV surveillance videos.
Hybrid model combines ResNet-based convolutional encoder and Bi-LSTM for feature extraction and temporal modeling.
Trained on the UCF-Crime Anomaly Detection Dataset with 13 anomaly categories.
Deployed as a web application with a Flask API backend and Vue.js frontend, supporting local operation.
Achieved reliable binary classification on the UCF-Crime test set with high inference speed.
Demonstrated enhanced spatio-temporal modeling compared to existing benchmarks.
Supported dual-mode input (video-file upload and live webcam) and user-friendly interface.

Abstract

The rapid proliferation of CCTV surveillance cameras across public and private spaces has rendered continuous manual monitoring inefficient, labour-intensive, and prone to operator fatigue. This work presents a real-time CCTV anomaly detection framework that automatically classifies surveillance video as Normal or Anomalous using a hybrid spatio-temporal deep learning pipeline. The proposed model couples a ResNet-based convolutional encoder for per-frame spatial feature extraction with a Bidirectional Long Short-Term Memory (Bi-LSTM) network for forward-and-backward temporal sequence modelling, operating on uniformly sampled sequences of 20 frames at a resolution of 224×224. The pipeline is trained on the UCF-Crime Anomaly Detection Dataset (Kaggle), comprising 13 anomaly categories alongside normal activity videos. The trained model is deployed as a full-stack web application: a Flask REST API backend (port 5000) serves the inference engine and exposes endpoints for both video-file upload and live webcam ingestion, while a Vue.js 3 frontend (Vite, port 5173) delivers a dual-mode user interface with drag-and-drop upload, WebRTC live capture, real-time backend health monitoring, and colour-coded result rendering. The system operates without specialized GPU hardware and avoids cloud dependence, making it suitable for low-cost local deployment. Experimental validation on the UCF-Crime test set confirms reliable binary classification with high inference responsiveness, and the modular two-tier architecture (app.py for routing, inference.py for model logic) supports straightforward future model upgrades. Compared to existing benchmark approaches such as the Sultani et al. MIL framework and frame-only CNN detectors, the proposed system simultaneously delivers spatio-temporal modelling, dual-mode input, and an accessible web interface in one deployable package.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Mamatha et al. (Mon,) studied this question.

synapsesocial.com/papers/6a1fc4e4dee9eb8c0dce6683 https://doi.org/https://doi.org/10.5281/zenodo.20484543

Bookmark

View Full Paper