March 6, 2021Open Access

T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification

Key Points

Key points are not available for this paper at this time.

Abstract

Deep Neural Network (DNN) classifiers are known to be vulnerable to Trojan or attacks, where the classifier is manipulated such that it any input containing an attacker-determined Trojan trigger. compromise a model's integrity, thereby posing a severe threat to the of DNN-based classification. While multiple defenses against such exist for classifiers in the image domain, there have been limited to protect classifiers in the text domain. We present Trojan-Miner (T-Miner) -- a defense framework for Trojan attacks DNN-based text classifiers. T-Miner employs a sequence-to-sequence (seq-2-seq) generative model that probes the suspicious classifier and learns produce text sequences that are likely to contain the Trojan trigger. -Miner then analyzes the text produced by the generative model to determine if contain trigger phrases, and correspondingly, whether the tested has a backdoor. T-Miner requires no access to the training dataset clean inputs of the suspicious classifier, and instead uses synthetically "nonsensical" text inputs to train the generative model. We extensively T-Miner on 1100 model instances spanning 3 ubiquitous DNN model, 5 different classification tasks, and a variety of trigger. We show that T-Miner detects Trojan and clean models with a 98. 75% accuracy, while achieving low false positives on clean models. We also that T-Miner is robust against a variety of targeted, advanced attacks an adaptive attacker.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ahmadreza Azizi

Virginia Tech

Ibrahim Asadullah Tahmid

Human Media

Asim Waheed

University of Waterloo

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study