Key points are not available for this paper at this time.
We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier. We find that only a few manipulations are needed to greatly decrease the accuracy. Our method relies on an atomic flip operation, which swaps one token for another, based on the gradients of the onehot input vectors. Due to efficiency of our method, we can perform adversarial training which makes the model more robust to attacks at test time. With the use of a few semantics-preserving constraints, we demonstrate that HotFlip can be adapted to attack a word-level classifier as well.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ebrahimi et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69dd6697c5e71f7918100f23 — DOI: https://doi.org/10.18653/v1/p18-2006
Javid Ebrahimi
Anyi Rao
Daniel Lowd
University of Oregon
Nanjing University
Building similarity graph...
Analyzing shared references across papers
Loading...