As machine learning is increasingly applied in healthcare, finance, and security, black-box adversarial attacks that manipulate inputs to mislead models without accessing internal details pose a growing security threat in real-world applications. This paper reviews black-box adversarial attacks by categorizing perturbation generation methods, such as gradient estimation and evolutionary search, along with attack targets, including directional and non-directional types. Besides, it distinguishes attack scenarios based on the attackers knowledge into white-box and black-box categories, analyzing their effects across different machine learning tasks and summarizing related challenges and opportunities. The results indicate that in image classification tasks, zero-order optimization (ZOO) demands a large number of queries to reach a high success rate, whereas evolutionary algorithms enhance defense effectiveness by about 30% but involve greater computational costs. For large-scale language models, template injection attacks achieve success rates greater than 80%, while the PathSeeker multi-agent reinforcement learning framework achieves over 80% detection evasion by dynamically adjusting templates. However, the effectiveness of cross-domain transfer is still severely constrained. Key directions include meta-learning with Bayesian optimization, cross-modal attack frameworks, and testing standards for high-risk settings.
Tianyi Huang (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: