What question did this study set out to answer?

This research aims to establish benchmark problems tailored for vulnerability analysis of complex systems using evolutionary computation.

March 15, 2026

Trap Finite State Machines with Delays: Testing and Evaluation Benchmarks with Adjustable Complexity Bricks

Key Points

This research aims to establish benchmark problems tailored for vulnerability analysis of complex systems using evolutionary computation.
Developed benchmarks with adjustable complexity factors including deception, ruggedness, epistasis, time, and noise levels.
Utilized TLA+ for formal modeling and reproducibility of benchmarks under fixed time budgets.
Implemented fitness landscape analysis and compared four state-of-the-art multi-objective optimisation methods.
Confirmed design principles through fitness landscape analysis.
Demonstrated that benchmarks abstract vital aspects of stress testing in complex systems.

Abstract

Testing and evaluation of complex systems relies on search techniques to discover vulnerabilities. While classic optimisation mostly aims at finding the global optimal solution, vulnerability analysis aims at discovering and reasoning about basins of attraction in a risk landscape. Despite the long history of evolutionary computation in providing search heuristics for vulnerability analysis, there is a lack of benchmark problems designed specifically for this purpose. The proposed benchmark provides design knobs to adjust complexity of generated instances. The TLA+ formal modeling language facilitates the independent reproduction, comparison under fixed time budgets, and extension of the work. Each benchmark is a sum of finite-state-machine bricks with controllable characteristics. Each brick is a bi-objective sub-problem with a stochastic evaluation wall-clock. The configuration space can be binary, integer, or noise-free or noisy continuous. The proposed benchmarks offer designers with complete control over five complexity factors: deception, ruggedness, epistasis, time, and noise levels. Fitness landscape analysis and a comparison of four state-of-the-art evolutionary multi-objective optimisation methods confirm our proposed design principles. We emphasise that the suite abstracts facets of real stress testing (deception, ruggedness, epistasis, and evaluation-time variability) rather than reproducing the full dynamics of complex, networked systems. Accordingly, our claims concern search-difficulty control and time-aware benchmarking, not end-to-end systems risk.

Bookmark

Trap Finite State Machines with Delays: Testing and Evaluation Benchmarks with Adjustable Complexity Bricks

Key Points

Abstract

Cite This Study