September 6, 2009

Tandem representations of spectral envelope and modulation frequency features for ASR

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

We present a feature extraction technique for automatic speech recognition that uses Tandem representation of short-term spectral envelope and modulation frequency features. These features, derived from sub-band temporal envelopes of speech estimated using frequency domain linear prediction, are combined at the phoneme posterior level. Tandem representations derived from these phoneme posteriors are used along with HMM based ASR systems for both small and large vocabulary continuous speech recognition (LVCSR) tasks. For a small vocabulary continuous digit task on the OGI Digits database, the proposed features reduce the word error rate (WER) by 13 % relative to other feature extraction techniques. We obtain a relative reduction of about 14 % in WER for an LVCSR task using the NIST RT05 evaluation data. For phoneme recognition tasks on the TIMIT database these features provide a relative improvement of 13% compared to other techniques.

Me gusta

Guardar

Cite This Study

Thomas et al. (Sun,) studied this question.

synapsesocial.com/papers/6a206e693d50bdc5d1029b85 https://doi.org/https://doi.org/10.21437/interspeech.2009-748

Me gusta

Guardar