Key points are not available for this paper at this time.
In this work, we establish the linear convergence estimate for the gradient descent involving the delay Formula: see text when the cost function is Formula: see text-strongly convex and Formula: see text-smooth. This result improves upon the well-known estimates in Y. Arjevani, O. Shamir and N. Srebro, A tight convergence analysis for stochastic gradient descent with delayed updates, Proc. Mach. Learn. Res. 117 (2020) 111–132; S. U. Stich and S. P. Karimireddy, The error-feedback framework: Better rates for SGD with delayed gradients and compressed updates, J. Mach. Learn. Res. 21(1) (2020) 9613–9648 in the sense that it is non-ergodic and is still established in spite of weaker constraint of cost function. Also, the range of learning rate Formula: see text can be extended from Formula: see text to Formula: see text for Formula: see text and Formula: see text for Formula: see text, where Formula: see text is the Lipschitz continuity constant of the gradient of cost function. In a further research, we show the linear convergence of cost function under the Polyak–ŁojasiewiczFormula: see text(PL) condition, for which the available choice of learning rate is further improved as Formula: see text for the large delay Formula: see text. The framework of the proof for this result is also extended to the stochastic gradient descent with time-varying delay under the PL condition. Finally, some numerical experiments are provided in order to confirm the reliability of the analyzed results.
Choi et al. (Sun,) studied this question.