March 3, 2026Open Access

A Multi-Stage Backdoor Detection (MSBD) Framework

Key Points

The Multi-Stage Backdoor Detection framework achieved an average F1-score of 99.0%, outperforming STRIP.
MSBD combines influence-based screening, trigger inversion, neuron activation analysis, and runtime detection for enhanced security.
Evaluation involved three benchmark datasets (MNIST, CIFAR-10, GTSRB) under BadNets-style backdoor attacks to demonstrate efficacy.
The implications suggest multi-stage defences significantly improve robustness compared to traditional single-stage methods.

Abstract

Backdoor attacks represent a serious challenge to robust deployment of machine learning (ML) and deep learning (DL) models in safety- and mission-critical fields. In a backdoor attack, an adversary injects a hidden trigger so that the model behaves normally when the inputs are clean but consistently produces attacker-chosen outputs when the trigger is present. Existing defences generally work at a single stage in the ML lifecycle-on data, on the model, or at inference time-and are thus susceptible to adaptive attackers that intentionally evade their underlying assumptions. This paper proposes Multi-Stage Backdoor Detection (MSBD), which provides a defence-in-depth structure that combines multiple mechanisms in the training, post-training model inspection, and deployment-time monitoring. MSBD has four stages: influence-based screening of training samples (Stage A), optimisation-based trigger inversion (Stage B), neuron activation graph analysis for detection of suspicious subnetworks (Stage C), as well as calibrated runtime detection with integration of trigger signatures and perturbation-based consistency checks (Stage D). The platform is intended to function under realistic defender conditions with limited access to both data and models, and for offline validation and online monitoring. We evaluate MSBD on three benchmark datasets (MNIST, CIFAR-10, GTSRB) under a strong BadNets-style backdoor attack and compare it against five representative defences (STRIP, Neural Cleanse, Activation Clustering, Spectral Signatures, and Fine-Pruning). Across all datasets, we find an average F1-score of 99.0% for MSBD, which is consistently better than STRIP’s, with a practical runtime overhead, showing that multi-stage, cross-layer defences can significantly improve robustness over single-stage defences.

A Multi-Stage Backdoor Detection (MSBD) Framework

Key Points

Abstract

Cite This Study