Testing and evaluation of complex systems relies on search techniques to discover vulnerabilities. While classic optimisation mostly aims at finding the global optimal solution, vulnerability analysis aims at discovering and reasoning about basins of attraction in a risk landscape. Despite the long history of evolutionary computation in providing search heuristics for vulnerability analysis, there is a lack of benchmark problems designed specifically for this purpose. The proposed benchmark provides design knobs to adjust complexity of generated instances. The TLA+ formal modeling language facilitates the independent reproduction, comparison under fixed time budgets, and extension of the work. Each benchmark is a sum of finite-state-machine bricks with controllable characteristics. Each brick is a bi-objective sub-problem with a stochastic evaluation wall-clock. The configuration space can be binary, integer, or noise-free or noisy continuous. The proposed benchmarks offer designers with complete control over five complexity factors: deception, ruggedness, epistasis, time, and noise levels. Fitness landscape analysis and a comparison of four state-of-the-art evolutionary multi-objective optimisation methods confirm our proposed design principles. We emphasise that the suite abstracts facets of real stress testing (deception, ruggedness, epistasis, and evaluation-time variability) rather than reproducing the full dynamics of complex, networked systems. Accordingly, our claims concern search-difficulty control and time-aware benchmarking, not end-to-end systems risk.
Tolley et al. (Fri,) studied this question.