What question did this study set out to answer?

The study aims to improve weapon detection accuracy and reliability in surveillance imagery by integrating a hybrid deep learning framework.

April 17, 2026Open Access

A Hybrid YOLO – SE Attention Based Deep Learning Framework for Robust Weapon Detection in Surveillance Imagery

Key Points

The study aims to improve weapon detection accuracy and reliability in surveillance imagery by integrating a hybrid deep learning framework.
Utilized YOLOv8n for region localization.
Incorporated a squeeze-and-excitation (SE) mechanism for enhanced classification.
Implemented conditional decision logic for stage consensus.
Evaluated models with NASNetMobile, DenseNet201, and InceptionV3.
Hybrid NASNetMobile + SE model achieved 96% accuracy, exceeding standalone YOLOv8n at 94%.
Improved pistol precision from 90% to 95% on a challenging dataset.
Reduced latency in inference compared to vanilla backbone models.

Abstract

ABSTRACT Weapon detection in surveillance imagery is a critical task for ensuring public safety in high‐risk environments. While single‐stage object detectors like YOLOv8 provide exceptional localization speed, their semantic reliability often diminishes when distinguishing between lethal weapons and visually similar non‐threat objects in complex scenes. To address this limitation, this work proposes a hybrid weapon detection framework that decouples object localization from semantic verification. The system utilizes a YOLOv8n detector for real‐time region localization, followed by an attention‐enhanced classification stage incorporating a Squeeze‐and‐Excitation (SE) mechanism. A novel conditional decision logic was implemented to manage the consensus between the two stages, incorporating a confidence‐weighted fail‐safe to maintain high recall for unambiguous threats. We evaluated three backbone architectures—NASNetMobile, DenseNet201, and InceptionV3, under identical constraints. Experimental results on a primary test set show that the hybrid NASNetMobile + SE model achieves 96% accuracy, significantly outperforming the standalone YOLOv8n (94%). More critically, on a challenging ablation dataset, the hybrid framework improved pistol precision from 90% to 95%. Furthermore, the integration of the SE block was found to optimize inference, reducing latency compared to the vanilla backbone. The proposed system represents a “Pareto‐optimal” solution for edge‐deployed security infrastructure, prioritizing high‐precision firearm recognition without compromising operational latency.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Bithi et al. (Wed,) studied this question.

synapsesocial.com/papers/69e1cf985cdc762e9d8587e0 https://doi.org/https://doi.org/10.1002/eng2.70736

Bookmark

View Full Paper