Key points are not available for this paper at this time.
The field of speech recognition has clearly benefited from precisely defined testing conditions and objective performance measures such as word error rate. In the development and evaluation of new methods, the question arises whether the empirically observed difference in performance is due to a genuine advantage of one system over the other, or just an effect of chance. However, many publications still do not concern themselves with the statistical significance of the results reported. We present a bootstrap method for significance analysis which is, at the same time, intuitive, precise and and easy to use. Unlike some methods, we make no (possibly ill-founded) approximations and the results are immediately interpretable in terms of word error rate.
Bisani et al. (Tue,) studied this question.