Key points are not available for this paper at this time.
This paper studies a class of adaptive gradient based momentum algorithms update the search directions and learning rates simultaneously using past. This class, which we refer to as the "Adam-type", includes the algorithms such as the Adam, AMSGrad and AdaGrad. Despite their in training deep neural networks, the convergence of these for solving nonconvex problems remains an open question. This paper a set of mild sufficient conditions that guarantee the convergence for Adam-type methods. We prove that under our derived conditions, these can achieve the convergence rate of order O (/) for stochastic optimization. We show the conditions are essential in the that violating them may make the algorithm diverge. Moreover, we propose analyze a class of (deterministic) incremental adaptive gradient, which has the same O (/) convergence rate. Our could also be extended to a broader class of adaptive gradient methods in learning and optimization.
Chen et al. (Wed,) studied this question.