Key points are not available for this paper at this time.
Deep Neural Network (DNN) classifiers are known to be vulnerable to Trojan or attacks, where the classifier is manipulated such that it any input containing an attacker-determined Trojan trigger. compromise a model's integrity, thereby posing a severe threat to the of DNN-based classification. While multiple defenses against such exist for classifiers in the image domain, there have been limited to protect classifiers in the text domain. We present Trojan-Miner (T-Miner) -- a defense framework for Trojan attacks DNN-based text classifiers. T-Miner employs a sequence-to-sequence (seq-2-seq) generative model that probes the suspicious classifier and learns produce text sequences that are likely to contain the Trojan trigger. -Miner then analyzes the text produced by the generative model to determine if contain trigger phrases, and correspondingly, whether the tested has a backdoor. T-Miner requires no access to the training dataset clean inputs of the suspicious classifier, and instead uses synthetically "nonsensical" text inputs to train the generative model. We extensively T-Miner on 1100 model instances spanning 3 ubiquitous DNN model, 5 different classification tasks, and a variety of trigger. We show that T-Miner detects Trojan and clean models with a 98. 75% accuracy, while achieving low false positives on clean models. We also that T-Miner is robust against a variety of targeted, advanced attacks an adaptive attacker.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ahmadreza Azizi
Virginia Tech
Ibrahim Asadullah Tahmid
Human Media
Asim Waheed
University of Waterloo
Building similarity graph...
Analyzing shared references across papers
Loading...
Azizi et al. (Sat,) studied this question.
synapsesocial.com/papers/6a216335f6aa648d3a57f1b7 — DOI: https://doi.org/10.48550/arxiv.2103.04264