Key points are not available for this paper at this time.
Recent deep neural network systems for large vocabulary speech recognition are trained with minibatch stochastic gradient descent but use a variety of learning rate scheduling schemes. We investigate several of these schemes, particularly AdaGrad. Based on our analysis of its limitations, we propose a new variant `AdaDec' that decouples long-term learning-rate scheduling from per-parameter learning rate variation. AdaDec was found to result in higher frame accuracies than other methods. Overall, careful choice of learning rate schemes leads to faster convergence and lower word error rates.
Building similarity graph...
Analyzing shared references across papers
Loading...
Senior et al. (Wed,) studied this question.
www.synapsesocial.com/papers/6a1559b3a2f71238514e559b — DOI: https://doi.org/10.1109/icassp.2013.6638963
Andrew Senior
Georg Heigold
Marc’Aurelio Ranzato
Google (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: