Los puntos clave no están disponibles para este artículo en este momento.
A study of the machine recognition of the spoken digits zero through nine has been carried out by a digital computer simulation. The spoken utterances were converted to time-frequency patterns of spectral energy. Recognition was done by cross correlating the pattern of an unknown utterance with a test pattern for each digit and selecting the digit having the highest correlation. Time normalization could be applied to all patterns, thus reducing utterances to a standard duration. Six male and one female speakers provided 38 samples of each of the 10 digits. Pauses were made between successive words for segmentation. No errors were observed recognizing a single speaker using test patterns from his own speech with time normalization. A group of five male speakers and test patterns averaged over the group produced 6% errors with time normalization and 12% without. A 25% rate occurred for the woman matched against male patterns. The study indicates both the effectiveness and limitations of this simple recognition procedure for limited vocabulary and limited number of speakers. Time normalization improves performance in all cases.
Denes et al. (Tue,) studied this question.