January 1, 2021Open Access

Gradient-based Adversarial Attacks against Text Transformers

Key Points

Key points are not available for this paper at this time.

Abstract

We propose the first general-purpose gradientbased adversarial attack against transformer models. Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix, hence enabling gradient-based optimization. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks, outperforming prior work in terms of adversarial success rate with matching imperceptibility as per automated and human evaluation. Furthermore, we show that a powerful black-box transfer attack, enabled by sampling from the adversarial distribution, matches or exceeds existing methods, while only requiring hard-label outputs.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Chuan Guo

Alexandre Sablayrolles

Hervé Jeǵou

Actions

Institutions

Meta (Israel)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Gradient-based Adversarial Attacks against Text Transformers

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study