Key points are not available for this paper at this time.
The Context-Dependent Deep-Neural-Network HMM, or CD-DNN-HMM, is a recently proposed acoustic-modeling tech-nique for HMM-based speech recognition that can greatly out-perform conventional Gaussian-mixture based HMMs. For ex-ample, a CD-DNN-HMM trained on the 2000h Fisher corpus achieves 14.4 % word error rate on the Hub5’00-FSH speaker-independent phone-call transcription task, compared to 19.6% obtained by a state-of-the-art, conventional discriminatively trained GMM-based HMM. That CD-DNN-HMM, however, took 59 days to train on a modern GPGPU—the immense computational cost of the mini-batch based back-propagation (BP) training is a major road-block. Unlike the familiar Baum-Welch training for conven-tional HMMs, BP cannot be efficiently parallelized across data. In this paper we show that the pipelined approximation to BP, which parallelizes computation with respect to layers, is an efficient way of utilizing multiple GPGPU cards in a single server. Using 2 and 4 GPGPUs, we achieve a 1.9 and 3.3 times end-to-end speed-up, at parallelization efficiency of 0.95 and 0.82, respectively, at no loss of recognition accuracy. Index Terms: speech recognition, deep neural networks, paral-lelization, GPGPU
Xie et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: