The Microsoft 2017 Conversational Speech Recognition System

Key Points

Key points are not available for this paper at this time.

Abstract

We describe the latest version of Microsoft's conversational speech recognition system for the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby acoustic model posteriors are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added another language model rescoring step following the confusion network combination. The resulting system yields a 5.1% word error rate on the NIST 2000 Switchboard test set, and 9.8% on the CallHome subset.

Bookmark

View Full Paper

Cite This Study

Xiong et al. (Sun,) studied this question.

synapsesocial.com/papers/69de5f4e57c7c8340a5585a6 https://doi.org/https://doi.org/10.1109/icassp.2018.8461870

Also Consider

Synapse has enriched 3 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper