Los puntos clave no están disponibles para este artículo en este momento.
In this work, we establish the linear convergence estimate for the gradient descent involving the delay Formula: see text when the cost function is Formula: see text-strongly convex and Formula: see text-smooth. This result improves upon the well-known estimates in Y. Arjevani, O. Shamir and N. Srebro, A tight convergence analysis for stochastic gradient descent with delayed updates, Proc. Mach. Learn. Res. 117 (2020) 111–132; S. U. Stich and S. P. Karimireddy, The error-feedback framework: Better rates for SGD with delayed gradients and compressed updates, J. Mach. Learn. Res. 21(1) (2020) 9613–9648 in the sense that it is non-ergodic and is still established in spite of weaker constraint of cost function. Also, the range of learning rate Formula: see text can be extended from Formula: see text to Formula: see text for Formula: see text and Formula: see text for Formula: see text, where Formula: see text is the Lipschitz continuity constant of the gradient of cost function. In a further research, we show the linear convergence of cost function under the Polyak–ŁojasiewiczFormula: see text(PL) condition, for which the available choice of learning rate is further improved as Formula: see text for the large delay Formula: see text. The framework of the proof for this result is also extended to the stochastic gradient descent with time-varying delay under the PL condition. Finally, some numerical experiments are provided in order to confirm the reliability of the analyzed results.
Choi et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: