Los puntos clave no están disponibles para este artículo en este momento.
It is known that speech recognition performance degrades if systems are not trained and tested under similar speaking conditions. This is particularly true if a speaker is exposed to demanding workload stress or noise. For recognition systems to be successful in applications susceptible to stress, speech recognizers should address the adverse conditions experienced by the user. The authors consider the problem of improved recognition training for speech recognition for various stressed speaking conditions (e.g., slow, loud, and Lombard effect speaking styles). The main objective is to devise a training procedure that produces a hidden Markov model recognizer that better characterizes a given stressed speaking style, without the need for directly collecting such stressed data. The novel approach is to construct a word production model using a previously suggested source generator framework Hansen 1994, by employing knowledge of the statistical nature of duration and spectral variation of speech under stress. This model is used in turn to produce simulated stressed speech training tokens from neutral speech tokens. The token generation training method is shown to improve isolated word recognition by 24% for Lombard speech when compared to a neutral trained isolated word recognizer. Further results are reported for isolated and keyword recognition scenarios.>
Hansen et al. (Sun,) studied this question.