June 23, 2024Open Access

Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution

Key Points

Key points are not available for this paper at this time.

Abstract

We consider a variant of the stochastic gradient descent (SGD) with a random learning rate and reveal its convergence properties. SGD is a widely used stochastic optimization algorithm in machine learning, especially deep learning. Numerous studies reveal the convergence properties of SGD and its simplified variants. Among these, the analysis of convergence using a stationary distribution of updated parameters provides generalizable results. However, to obtain a stationary distribution, the update direction of the parameters must not degenerate, which limits the applicable variants of SGD. In this study, we consider a novel SGD variant, Poisson SGD, which has degenerated parameter update directions and instead utilizes a random learning rate. Consequently, we demonstrate that a distribution of a parameter updated by Poisson SGD converges to a stationary distribution under weak assumptions on a loss function. Based on this, we further show that Poisson SGD finds global minima in non-convex optimization problems and also evaluate the generalization error using this method. As a proof technique, we approximate the distribution by Poisson SGD with that of the bouncy particle sampler (BPS) and derive its stationary distribution, using the theoretical advance of the piece-wise deterministic Markov process (PDMP).

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Yoshida et al. (Sun,) studied this question.

synapsesocial.com/papers/68e63ae4b6db6435875cc746 — DOI: https://doi.org/10.48550/arxiv.2406.16032

Authors

Naoki Yoshida

Panasonic (Japan)

Shogo Nakakita

Tokyo University of Science

Masaaki Imaizumi

Tokyo University of the Arts

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion