September 5, 1999

Modeling and efficient decoding of large vocabulary conversational speech

Key Points

Key points are not available for this paper at this time.

Abstract

Capturing the large variability of conversational speech in the framework of purely phone based speech recognizers is virtually impossible. It has been shown earlier that suprasegmental features such as speaking rate, duration and syllabic, syntactic and semantic structure are important predictors of pronunciation variation. In order to allow for a tighter coupling of these predictors of pronunciation, duration and acoustic modeling a new recognition toolkit has been developed. The phonetic transcription of speech has been generalized to an attribute based representation, thus enabling the integration of suprasegmental, non-phonetic features. A pronunciation model is trained to augment the attribute transcription to mark possible pronunciation effects which are then taken into account by the acoustic model induction algorithm. A finite state machine single-prefix-tree, one-pass, time-synchronous decoder is presented that efficiently decodes highly spontaneous speech within this new representational framework.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Michael Finke

Deutsches Zentrum für Luft- und Raumfahrt e. V. (DLR)

Jürgen Fritsch

University Hospital Regensburg

Detlef Koll

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Modeling and efficient decoding of large vocabulary conversational speech

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study