Key points are not available for this paper at this time.
Capturing the large variability of conversational speech in the framework of purely phone based speech recognizers is virtually impossible. It has been shown earlier that suprasegmental features such as speaking rate, duration and syllabic, syntactic and semantic structure are important predictors of pronunciation variation. In order to allow for a tighter coupling of these predictors of pronunciation, duration and acoustic modeling a new recognition toolkit has been developed. The phonetic transcription of speech has been generalized to an attribute based representation, thus enabling the integration of suprasegmental, non-phonetic features. A pronunciation model is trained to augment the attribute transcription to mark possible pronunciation effects which are then taken into account by the acoustic model induction algorithm. A finite state machine single-prefix-tree, one-pass, time-synchronous decoder is presented that efficiently decodes highly spontaneous speech within this new representational framework.
Building similarity graph...
Analyzing shared references across papers
Loading...
Michael Finke
Deutsches Zentrum für Luft- und Raumfahrt e. V. (DLR)
Jürgen Fritsch
University Hospital Regensburg
Detlef Koll
Building similarity graph...
Analyzing shared references across papers
Loading...
Finke et al. (Sun,) studied this question.
synapsesocial.com/papers/6a204d314ad5e85db1e71ae6 — DOI: https://doi.org/10.21437/eurospeech.1999-120