What question did this study set out to answer?

The aim is to improve the reliability and robustness of NLP software by generating effective adversarial test cases.

January 25, 2026

HGA: Heuristic Black‐Box Test Case Generation for NLP Intelligent Software

Key Points

The aim is to improve the reliability and robustness of NLP software by generating effective adversarial test cases.
Developed a black-box adversarial test case generation method called HGA.
Implemented a genetic algorithm with heuristic multi-point crossover and elite strategy.
Designed a fitness function based on part-of-speech and semantic similarity.
HGA significantly outperformed baseline methods in attack success rate.
It produced higher quality test cases as assessed by human evaluation.
Time overhead was reduced by nearly half compared to existing heuristic methods.
Robustness of the attack model improved by up to 5% with HGA-generated test cases.

Abstract

ABSTRACT Intelligent software built on Deep Neural Networks (DNNs) has been widely used in many fields, especially in Natural Language Processing (NLP). However, even the most advanced NLP models can also be attacked by delicately designed adversarial perturbations in many NLP application scenarios. To improve the reliability and robustness of such Intelligent software, adversarial test cases are generated to evaluate their performance. Existing adversarial text test case generation methods can be divided into two categories: heuristic and greedy. However, few methods can deceive DNNs while preserving the syntactic structure and semantic information of the original text, and most heuristic generation methods have a high time overhead. To address these challenges, this paper proposes a black‐box NLP adversarial test case generation method—HGA, which optimizes the G enetic A lgorithm by adopting a H euristic multi‐point crossover and cross‐generational elite strategy to improve search efficiency, and designs a new fitness function based on part‐of‐speech and semantic similarity so as to guarantee the quality of test cases. We evaluate HGA upon four widely used text classification models and three real user datasets. The results show that HGA outperforms the baselines in the term of attack success rate, as well as human evaluation on test cases. In addition, the time overhead of HGA is reduced by nearly half compared to existing heuristic generation methods. Through training with the adversarial test cases generated by the HGA, the robustness of the threat model can be improved by up to 5%.

Bookmark

Cite This Study

Xiao et al. (Thu,) studied this question.

synapsesocial.com/papers/6975b38dfeba4585c2d6f0fa https://doi.org/https://doi.org/10.1002/stvr.70018

Bookmark