Stochastic optimization is the workhorse behind the success of many machine learning algorithms. The existing theoretical analysis of stochastic optimization mainly focuses on the behavior on the training dataset or requires a convexity assumption. In this paper, we provide a comprehensive analysis on the generalization behavior of stochastic optimization with nonconvex problems. We first present both upper and lower bounds on the uniform convergence of gradients. Our analysis outperforms existing results by incorporating the 2nd moment of the gradient at a single model into the upper bound. Based on this uniform convergence, we provide a high-probability bound on the gradient norm of population risks for stochastic gradient descent (SGD), which significantly improves the existing results. We show that better bounds can be achieved under further assumptions such as quasi-convexity or Polyak-Łojasiewicz condition. Our analysis shows the computation cost can be further decreased by taking the variance-reduction trick. Finally, we study the utility guarantee of SGD under a privacy constraint. Our results show a linear speed up with respect to the batch size, which shows the benefit of computing gradients in a distributed manner.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yunwen Lei
University of Hong Kong
IEEE Transactions on Pattern Analysis and Machine Intelligence
University of Hong Kong
Building similarity graph...
Analyzing shared references across papers
Loading...
Yunwen Lei (Wed,) studied this question.
synapsesocial.com/papers/68f10ecee6a12fd0428998e5 — DOI: https://doi.org/10.1109/tpami.2025.3621591