Key points are not available for this paper at this time.
The authors present a large-vocabulary speech recognition method based on hidden Markov models (HMMs) and aimed at high recognition performance with a small amount of training data. The recognition model is designed to treat contextual and allophonic variations utilizing acoustic-phonetic knowledge. The demisyllable is used as a recognition unit to treat contextual variations caused by the coarticulation effect. A single Gaussian probability density function is used as the HMM output probability, and allophonic units are defined to deal with greater allophonic variations, such as vowel devoicing. In an experiment, demisyllable models were trained using a 250 training word set, and 99.0% and 97.5% recognition rates were obtained for 500-word and 1800-word vocabularies, respectively. The result demonstrates the effectiveness of the method.>
Yoshida et al. (Mon,) studied this question.