Key points are not available for this paper at this time.
Speaker-independent recognition of small vocabularies, spoken over the long-distance telephone network has been demonstrated to be a viable technology. However, the algorithms tested and the tasks evaluated both assumed that user input would be restricted to only a small set of defined vocabulary words. Recently, a large scale trial of speaker-independent, isolated-word, speech-recognition technology was carried out in Hayward, California. The task chosen required that users speak, in isolation, one of five defined vocabulary words (collect, calling-card, person, third-number, and operator). Observations of customer responses during this trial, indicated that about 20% of the utterances had the desired vocabulary item along with extraneous input that ranged from nonspeech sounds to groups of words (e.g., “I want to make a collect call please”). Our current recognition algorithms have not been designed to handle this type of input. As such, a modification of the recognition algorithms had to be made to handle words embedded in speech (i.e., a form of key-word spotting). In this paper, two recognition algorithms are presented, one based on templates and the other based on hidden Markov models. Both algorithms are designed to recognize vocabulary words in the context of unconstrained speech. Currently, recognition rates of 99% for strictly isolated input (i.e., with no extraneous speech or gross artifacts) and 90% for vocabulary words spoken in unconstrained speech are being achieved.
Bossemeyer et al. (Tue,) studied this question.