February 18, 2024Open Access

Non-ergodic linear convergence property of the delayed gradient descent under the strongly convexity and the Polyak–Łojasiewicz condition

Key Points

Key points are not available for this paper at this time.

Abstract

In this work, we establish the linear convergence estimate for the gradient descent involving the delay Formula: see text when the cost function is Formula: see text-strongly convex and Formula: see text-smooth. This result improves upon the well-known estimates in Y. Arjevani, O. Shamir and N. Srebro, A tight convergence analysis for stochastic gradient descent with delayed updates, Proc. Mach. Learn. Res. 117 (2020) 111–132; S. U. Stich and S. P. Karimireddy, The error-feedback framework: Better rates for SGD with delayed gradients and compressed updates, J. Mach. Learn. Res. 21(1) (2020) 9613–9648 in the sense that it is non-ergodic and is still established in spite of weaker constraint of cost function. Also, the range of learning rate Formula: see text can be extended from Formula: see text to Formula: see text for Formula: see text and Formula: see text for Formula: see text, where Formula: see text is the Lipschitz continuity constant of the gradient of cost function. In a further research, we show the linear convergence of cost function under the Polyak–ŁojasiewiczFormula: see text(PL) condition, for which the available choice of learning rate is further improved as Formula: see text for the large delay Formula: see text. The framework of the proof for this result is also extended to the stochastic gradient descent with time-varying delay under the PL condition. Finally, some numerical experiments are provided in order to confirm the reliability of the analyzed results.

Non-ergodic linear convergence property of the delayed gradient descent under the strongly convexity and the Polyak–Łojasiewicz condition

Key Points

Abstract

Cite This Study