In this position paper, we argue that the commonly-accepted definition of "flaky test" -as an execution that may non-deterministically pass and fail-is inadequate. We support our claim through several examples illustrating some of the nuance of non-deterministic tests: some clearly useful (e.g., property-based tests), a few blatantly incorrect (i.e., design flaws) and a large grey area, where it is unclear whether we are facing a design flaw (incorrect specification) or a software fault (incorrect implementation).Moving toward an actionable criteria to tackle flakiness in test suites, we develop a formal model of flakiness based on transition systems. Our formalization work touches upon the core conceptual challenge of flaky tests: imperative developers struggle to account for every sources of non-determinism in their own test code and also struggle to give accurate specifications of their (non-deterministic, in ever surprising ways) applications. We blame this state of affair on the crushing complexity of the programming languages misused as specification languages (i.e., to write tests) and hint toward alternatives, which may inspire the Flaky Test community.
Dagand et al. (Sat,) studied this question.