Stochastic Gradient Descent (SGD) and its Ruppert-Polyak averaged variant (ASGD) lie at the heart of modern large-scale learning, yet their theoretical properties in high-dimensional settings are rarely understood. In this paper, we provide rigorous statistical guarantees for constant learning-rate SGD and ASGD in high-dimensional regimes. Our key innovation is to transfer powerful tools from high-dimensional time series to online learning. Specifically, by viewing SGD as a nonlinear autoregressive process and adapting existing coupling techniques, we prove the geometric-moment contraction of high-dimensional SGD for constant learning rates, thereby establishing asymptotic stationarity of the iterates. Building on this, we derive the q-th moment convergence of SGD and ASGD for any q2 in general ˢ-norms, and, in particular, the ^-norm that is frequently adopted in high-dimensional sparse or structured models. Furthermore, we provide sharp high-probability concentration analysis which entails the probabilistic bound of high-dimensional ASGD. Beyond closing a critical gap in SGD theory, our proposed framework offers a novel toolkit for analyzing a broad class of high-dimensional learning algorithms.
Building similarity graph...
Analyzing shared references across papers
Loading...
J. Jenny Li
Hangzhou Normal University
Zhipeng Lou
National University of Singapore
Johannes Schmidt-Hieber
University of Twente
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Mon,) studied this question.
synapsesocial.com/papers/68f3793258f37cefb60d3576 — DOI: https://doi.org/10.48550/arxiv.2510.12013
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: