What question did this study set out to answer?

This study aims to develop an effective method for identifying and filtering poisoned samples from training data in deep neural networks.

April 26, 2026Open Access

TCF-CBM: a three-stage collaborative poisoned sample filtering method based on composite backdoor mechanism

Key Points

This study aims to develop an effective method for identifying and filtering poisoned samples from training data in deep neural networks.
Developed TCF-CBM framework to exploit discrepancies between poisoned and clean samples using prediction entropy and input gradient norm.
Implemented dual-metric screening for coarse-grained differentiation of samples.
Applied composite backdoor mechanism for enhancing feature-level differences and trigger-response discrimination for fine-grained separation.
TCF-CBM consistently reduced attack success rates (ASR) on CIFAR-10, GTSRB, and ImageNet datasets across seven backdoor attacks.
Effectiveness demonstrated by higher clean accuracy compared to six mainstream defense methods.

Abstract

Deep Neural Networks have achieved remarkable success in representation learning and generalization, yet remain highly vulnerable to backdoor attacks, where adversaries implant imperceptible triggers into training data to induce malicious predictions while maintaining normal performance on clean inputs. Existing defenses often rely on external clean validation data or strong assumptions about attack characteristics, which limit their applicability in practical scenarios. To address this challenge, we propose a Three-Stage Collaborative Poisoned Sample Filtering Method based on Composite Backdoor Mechanism (TCF-CBM). The proposed framework exploits behavioral discrepancies between poisoned and clean samples, quantified by prediction entropy and input gradient norm, to enable accurate separation without requiring external clean data. First, a dual-metric screening mechanism based on prediction entropy and input gradient norm is employed for coarse-grained differentiation between poisoned and clean samples. Second, we employ a composite backdoor mechanism for benign trigger augmentation, which amplifies the feature-level differences between poisoned and clean samples. Finally, trigger-response discrimination enables fine-grained separation, yielding a purified and robust training set. Extensive experiments on CIFAR-10, GTSRB, and ImageNet across seven representative backdoor attacks demonstrate that TCF-CBM consistently reduces attack success rates (ASR) while maintaining high clean accuracy, outperforming six mainstream defense methods in robustness and generalization.

Bookmark

View Full Paper