Key points are not available for this paper at this time.
Acoustic models used in hidden Markov model/neural-network (HMM/NN) speech recognition systems are usually trained with a frame-based cross-entropy error criterion. In contrast, Gaussian mixture HMM systems are discriminatively trained using sequence-based criteria, such as minimum phone error or maximum mutual information, that are more directly related to speech recognition accuracy. This paper demonstrates that neural-network acoustic models can be trained with sequence classification criteria using exactly the same lattice-based methods that have been developed for Gaussian mixture HMMs, and that using a sequence classification criterion in training leads to considerably better performance. A neural network acoustic model with 153K weights trained on 50 hours of broadcast news has a word error rate of 34.0% on the rt04 English broadcast news test set. When this model is trained with the state-level minimum Bayes risk criterion, the rt04 word error rate is 27.7%.
Building similarity graph...
Analyzing shared references across papers
Loading...
Brian Kingsbury (Wed,) studied this question.
synapsesocial.com/papers/6a0cf02cd24d91c50ccc8d01 — DOI: https://doi.org/10.1109/icassp.2009.4960445
Brian Kingsbury
IBM (United States)
IBM (United States)
IBM Research - Thomas J. Watson Research Center
Building similarity graph...
Analyzing shared references across papers
Loading...