What type of study is this?

This is a Quantitative Study study.

October 2, 2025Open Access

Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds

Key Points

The research reveals a Pareto frontier between risk sensitivity index and convergence rate in gradient methods, impacting optimization outcomes.
Closed form expressions are provided in terms of Riccati equations, assisting in improved understanding of error robustness in optimization.
The findings include non-asymptotic large-deviation bounds, enhancing the practical guarantees of generalized momentum methods with biased gradients.
Numerical experiments support results, showcasing the application of the framework in robust regression problems and its relevance in real-world scenarios.

Abstract

We study trade-offs between convergence rate and robustness to gradient errors in the context of first-order methods. Our focus is on generalized momentum methods (GMMs) --a broad class that includes Nesterov's accelerated gradient, heavy-ball, and gradient descent methods--for minimizing smooth strongly convex objectives. We allow stochastic gradient errors that may be adversarial and biased, and quantify robustness of these methods to gradient errors via the risk-sensitive index (RSI) from robust control theory. For quadratic objectives with i. i. d. Gaussian noise, we give closed form expressions for RSI in terms of solutions to 2x2 matrix Riccati equations, revealing a Pareto frontier between RSI and convergence rate over the choice of step-size and momentum parameters. We then prove a large-deviation principle for time-averaged suboptimality in the large iteration limit and show that the rate function is, up to a scaling, the convex conjugate of the RSI function. We further show that the rate function and RSI are linked to the H_-norm--a measure of robustness to the worst-case deterministic gradient errors--so that stronger worst-case robustness (smaller H_-norm) leads to sharper decay of the tail probabilities for the average suboptimality. Beyond quadratics, under potentially biased sub-Gaussian gradient errors, we derive non-asymptotic bounds on a finite-time analogue of the RSI, yielding finite-time high-probability guarantees and non-asymptotic large-deviation bounds for the averaged iterates. In the case of smooth strongly convex functions, we also observe an analogous trade-off between RSI and convergence-rate bounds. To our knowledge, these are the first non-asymptotic guarantees for GMMs with biased gradients and the first risk-sensitive analysis of GMMs. Finally, we provide numerical experiments on a robust regression problem to illustrate our results.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Mert Gürbüzbalaban

Yasa Syed

Necdet Serhat Aybat

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study