What type of study is this?

This is a Quantitative Study study.

synapse

⌘+K

synapse

⌘+K

October 7, 2025Open Access

Convergence of Clipped SGD on Convex (L₀, L₁) -Smooth Functions

Key Points

Clipped SGD achieves high probability convergence rates comparable to L-smooth functions, enhancing optimization performance.
Gradient clipping ensures that the SGD rate is matched up to polylogarithmic factors and additive terms, improving efficiency.
A variation of adaptive SGD with gradient clipping is proposed, maintaining the same convergence guarantees as standard SGD.
Empirical experiments validate theoretical findings and explore practical implications of algorithmic choices in optimization.

Abstract

We study stochastic gradient descent (SGD) with gradient clipping on convex functions under a generalized smoothness assumption called (L₀, L₁) -smoothness. Using gradient clipping, we establish a high probability convergence rate that matches the SGD rate in the L smooth case up to polylogarithmic factors and additive terms. We also propose a variation of adaptive SGD with gradient clipping, which achieves the same guarantee. We perform empirical experiments to examine our theory and algorithmic choices.

Convergence of Clipped SGD on Convex (L₀, L₁) -Smooth Functions

Key Points

Abstract

Cite This Study