September 9, 2012

Parallel training for deep stacking networks

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

The Deep Stacking Network (DSN) is a special type of deep architecture developed to enable and benefit from parallel learning of its model parameters on large CPU clusters. As a prospective key component of future speech recognizers, the architectural design of the DSN and its parallel training endow the DSN with scalability over a vast amount of training data. In this paper, we present our first parallel implementation of the DSN training algorithm. Particularly, we show the tradeoff between the time/memory saving via training parallelism and the associated cost arising from inter-CPU communication. Further, in phone classification experiments, we demonstrate a significantly lowered error rate using parallel full-batch training distributed over a CPU cluster, compared with sequential minibatch training implemented in a single CPU machine under otherwise identical experimental conditions and as exploited prior to the work reported in this paper. Index Terms: parallel and distributed computing, deep stacking networks, full-batch training, phone classification

Preguntar a la IA

Me gusta

Guardar