Key points are not available for this paper at this time.
Speech recognizers are typically trained with data from a stan-dard dialect and do not generalize to non-standard dialects. Mis-match mainly occurs in the acoustic realization of words, which is represented by acoustic models and pronunciation lexicon. Standard techniques for addressing this mismatch are generative in nature and include acoustic model adaptation and expansion of lexicon with pronunciation variants, both of which have lim-ited effectiveness. We present a discriminative pronunciation model whose parameters are learned jointly with parameters from the language models. We tease apart the gains from mod-eling the transitions of canonical phones, the transduction from surface to canonical phones, and the language model. We report experiments on African American Vernacular English (AAVE) using NPR’s StoryCorps corpus. Our models improve the per-formance over the baseline by about 2.1 % on AAVE, of which 0.6 % can be attributed to the pronunciation model. The model learns the most relevant phonetic transformations for AAVE speech. Index Terms: large vocabulary speech recognition, dialec-tal speech recognition, pronunciation modeling, discriminative training 1.
Lehr et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: