What type of study is this?

This is a Literature Review study.

September 26, 2025Open Access

The Mechanisms, Evaluation, and Future Prospects of Black-Box Adversarial Attacks

Key Points

In image classification tasks, zero-order optimization requires many queries for a high success rate, illustrating the need for effective attack strategies.
Evolutionary algorithms can enhance defense effectiveness by about 30%, though they incur higher computational costs compared to other methods.
Template injection attacks in large-scale language models achieve success rates exceeding 80%, marking them as significant threats to model integrity.
The effectiveness of cross-domain transfer remains limited, emphasizing the need for ongoing research in attack resilience and model protection.

Abstract

As machine learning is increasingly applied in healthcare, finance, and security, black-box adversarial attacks that manipulate inputs to mislead models without accessing internal details pose a growing security threat in real-world applications. This paper reviews black-box adversarial attacks by categorizing perturbation generation methods, such as gradient estimation and evolutionary search, along with attack targets, including directional and non-directional types. Besides, it distinguishes attack scenarios based on the attackers knowledge into white-box and black-box categories, analyzing their effects across different machine learning tasks and summarizing related challenges and opportunities. The results indicate that in image classification tasks, zero-order optimization (ZOO) demands a large number of queries to reach a high success rate, whereas evolutionary algorithms enhance defense effectiveness by about 30% but involve greater computational costs. For large-scale language models, template injection attacks achieve success rates greater than 80%, while the PathSeeker multi-agent reinforcement learning framework achieves over 80% detection evasion by dynamically adjusting templates. However, the effectiveness of cross-domain transfer is still severely constrained. Key directions include meta-learning with Bayesian optimization, cross-modal attack frameworks, and testing standards for high-risk settings.

Read Full Paperexternally

Perguntar à IA

Bookmark

View Full Paper