Los puntos clave no están disponibles para este artículo en este momento.
Recently, pre-trained deep neural networks (DNNs) have outperformed traditional acoustic models based on Gaussian mixture models (GMMs) on a variety of large vocabulary speech recognition benchmarks. Deep neural nets have also achieved excellent results on various computer vision tasks using a random “dropout” procedure that drastically improves generalization error by randomly omitting a fraction of the hidden units in all layers. Since dropout helps avoid over-fitting, it has also been successful on a small-scale phone recognition task using larger neural nets. However, training deep neural net acoustic models for large vocabulary speech recognition takes a very long time and dropout is likely to only increase training time. Neural networks with rectified linear unit (ReLU) non-linearities have been highly successful for computer vision tasks and proved faster to train than standard sigmoid units, sometimes also improving discriminative performance. In this work, we show on a 50-hour English Broadcast News task that modified deep neural networks using ReLUs trained with dropout during frame level training provide an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improvement over a strong GMM/HMM system. We were able to obtain our results with minimal human hyper-parameter tuning using publicly available Bayesian optimization code.
Building similarity graph...
Analyzing shared references across papers
Loading...
Dahl et al. (Wed,) studied this question.
synapsesocial.com/papers/6a07fabd7ad161a3abfe0f1b — DOI: https://doi.org/10.1109/icassp.2013.6639346
George E. Dahl
Swarthmore College
Tara N. Sainath
Massachusetts Institute of Technology
Geoffrey E. Hinton
University of New Brunswick
University of Toronto
IBM (United States)
IBM Research - Thomas J. Watson Research Center
Building similarity graph...
Analyzing shared references across papers
Loading...