September 9, 2012

Pipelined back-propagation for context-dependent deep neural networks

CXChen XieSichuan University of Science and Engineering AEAdam EversoleMicrosoft (United States)GLGang LiShanxi University

Key Points

Key points are not available for this paper at this time.

Abstract

The Context-Dependent Deep-Neural-Network HMM, or CD-DNN-HMM, is a recently proposed acoustic-modeling tech-nique for HMM-based speech recognition that can greatly out-perform conventional Gaussian-mixture based HMMs. For ex-ample, a CD-DNN-HMM trained on the 2000h Fisher corpus achieves 14.4 % word error rate on the Hub5’00-FSH speaker-independent phone-call transcription task, compared to 19.6% obtained by a state-of-the-art, conventional discriminatively trained GMM-based HMM. That CD-DNN-HMM, however, took 59 days to train on a modern GPGPU—the immense computational cost of the mini-batch based back-propagation (BP) training is a major road-block. Unlike the familiar Baum-Welch training for conven-tional HMMs, BP cannot be efficiently parallelized across data. In this paper we show that the pipelined approximation to BP, which parallelizes computation with respect to layers, is an efficient way of utilizing multiple GPGPU cards in a single server. Using 2 and 4 GPGPUs, we achieve a 1.9 and 3.3 times end-to-end speed-up, at parallelization efficiency of 0.95 and 0.82, respectively, at no loss of recognition accuracy. Index Terms: speech recognition, deep neural networks, paral-lelization, GPGPU

Ask AI

Helpful

Bookmark

View Full Paper