Key points are not available for this paper at this time.
An isolated word recognition system that uses character string encoding is described that has achieved 98% correct recognition scores on limited vocabularies (20-54 words). Speaker normalization, word segmentation, and learning paradigms have been incorporated. Audio input passes through a 6-channel octave band pass filter bank. The output of each channel is time integrated for 10 ms, and log mapped. An utterance is represented by a succession of points (a new point is generated every 10 ms) in the 6- dimensional space defined by the 6 octave bands. Reference points are scattered throughout the space. Each time interval is assigned the label of the nearest reference point. We call the resulting string of labels a "character string". Encoding an utterance into a character string may proceed with an arbitrary degree of precision, greater resolution resulting from the use of more reference points. Only 24 reference points are needed to achieve 98% correct recognition scores for 54 words in near real time. String generation techniques are explored. Several learning schemes based on character strings are described. Finally, experiments with a software classifier that uses "deformable templates" based on character strings are presented.
George M. White (Fri,) studied this question.
Synapse has enriched one closely related paper. Consider it for comparative context: