Deep Neural Networks have achieved remarkable success in representation learning and generalization, yet remain highly vulnerable to backdoor attacks, where adversaries implant imperceptible triggers into training data to induce malicious predictions while maintaining normal performance on clean inputs. Existing defenses often rely on external clean validation data or strong assumptions about attack characteristics, which limit their applicability in practical scenarios. To address this challenge, we propose a Three-Stage Collaborative Poisoned Sample Filtering Method based on Composite Backdoor Mechanism (TCF-CBM). The proposed framework exploits behavioral discrepancies between poisoned and clean samples, quantified by prediction entropy and input gradient norm, to enable accurate separation without requiring external clean data. First, a dual-metric screening mechanism based on prediction entropy and input gradient norm is employed for coarse-grained differentiation between poisoned and clean samples. Second, we employ a composite backdoor mechanism for benign trigger augmentation, which amplifies the feature-level differences between poisoned and clean samples. Finally, trigger-response discrimination enables fine-grained separation, yielding a purified and robust training set. Extensive experiments on CIFAR-10, GTSRB, and ImageNet across seven representative backdoor attacks demonstrate that TCF-CBM consistently reduces attack success rates (ASR) while maintaining high clean accuracy, outperforming six mainstream defense methods in robustness and generalization.
Huang et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: