Key points are not available for this paper at this time.
Recently the LARS and LAMB optimizers have been proposed for training neural faster using large batch sizes. LARS and LAMB add layer-wise to the update rules of Heavy-ball momentum and Adam, , and have become popular in prominent benchmarks and deep learning. However, without fair comparisons to standard optimizers, it remains open question whether LARS and LAMB have any benefit over traditional, algorithms. In this work we demonstrate that standard optimization such as Nesterov momentum and Adam can match or exceed the results LARS and LAMB at large batch sizes. Our results establish new, stronger for future comparisons at these batch sizes and shed light on the of comparing optimizers for neural network training more.
Nado et al. (Fri,) studied this question.