May 1, 2013

An empirical study of learning rates in deep neural networks for speech recognition

Key Points

Key points are not available for this paper at this time.

Abstract

Recent deep neural network systems for large vocabulary speech recognition are trained with minibatch stochastic gradient descent but use a variety of learning rate scheduling schemes. We investigate several of these schemes, particularly AdaGrad. Based on our analysis of its limitations, we propose a new variant `AdaDec' that decouples long-term learning-rate scheduling from per-parameter learning rate variation. AdaDec was found to result in higher frame accuracies than other methods. Overall, careful choice of learning rate schemes leads to faster convergence and lower word error rates.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Senior et al. (Wed,) studied this question.

www.synapsesocial.com/papers/6a1559b3a2f71238514e559b — DOI: https://doi.org/10.1109/icassp.2013.6638963

Authors

Andrew Senior

Georg Heigold

Marc’Aurelio Ranzato

Actions

Institutions

Google (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

An empirical study of learning rates in deep neural networks for speech recognition

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider