Key points are not available for this paper at this time.
This paper presents a new class of gradient methods for distributed machine that adaptively skip the gradient calculations to learn with reduced and computation. Simple rules are designed to detect-varying gradients and, therefore, trigger the reuse of outdated. The resultant gradient-based algorithms are termed Lazily Aggregated --- justifying our acronym LAG used henceforth. Theoretically, the of this contribution are: i) the convergence rate is the same as batch descent in strongly-convex, convex, and nonconvex smooth cases; and, ) if the distributed datasets are heterogeneous (quantified by certain constants), the communication rounds needed to achieve a targeted are reduced thanks to the adaptive reuse of lagged gradients. experiments on both synthetic and real data corroborate a significant reduction compared to alternatives.
Chen et al. (Thu,) studied this question.