What type of study is this?

This is a Literature Review study.

September 5, 2025Open Access

Modern non-linear optimization algorithm of large-scale for combinatorial approximate methods

Key Points

Stochastic gradient descent offers faster parameter updates, while gradient descent ensures stable convergence.
Gradient descent achieves high accuracy but can be inefficient on large datasets, unlike stochastic gradient descent.
Both methods aim to minimize loss functions, balancing efficiency and stability in finding optimal parameters.
The choice between SGD and GD impacts the performance of machine learning models, especially with large datasets.

Abstract

In this paper, we compare two methods, stochastic gradient descent (SGD) and gradient descent (GD), which are optimization algorithms used to minimize loss functions in machine learning. GD updates the model parameters by calculating the gradient over the entire dataset before taking a step. This ensures stable convergence but is computationally expensive. On the other hand, SGD updates the parameters after processing a single random data point, making it much faster but introducing noise. GD follows a smooth path to a minimum, while SGD takes a noisy, winding path, sometimes exceeding a local minimum but also escaping it. For large datasets, GD becomes inefficient, while SGD scales well and is typically used in deep learning. To balance stability and efficiency, both methods aim to find the optimal parameters for machine learning models, with GD focusing on accuracy and SGD on speed.

Modern non-linear optimization algorithm of large-scale for combinatorial approximate methods

Key Points

Abstract

Cite This Study