Key points are not available for this paper at this time.
We describe the latest version of Microsoft's conversational speech recognition system for the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby acoustic model posteriors are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added another language model rescoring step following the confusion network combination. The resulting system yields a 5.1% word error rate on the NIST 2000 Switchboard test set, and 9.8% on the CallHome subset.
Xiong et al. (Sun,) studied this question.
Synapse has enriched 3 closely related papers on similar clinical questions. Consider them for comparative context: