Sharpness-Aware Minimization (SAM) has emerged as a state-of-the-art optimization framework that improves the generalization of deep neural networks by actively seeking flatter regions of the loss landscape. Despite its widespread empirical success, the convergence properties of SAM remain partially understood. In particular, standard deterministic SAM utilizing a normalized perturbation often exhibits a persistent, non-vanishing steady-state error and fails to converge to the exact minimum even on simple deterministic quadratic objectives. In this paper, we conduct a rigorous mathematical and empirical analysis of this phenomenon. We prove that on data-driven quadratic loss surfaces with anisotropic Hessian curvature, the optimization trajectory of standard SAM is academically trapped in a stable period-2 limit cycle. We derive the exact closed-form radius R of this limit cycle as a function of the learning rate, perturbation radius, and maximum eigenvalue of the Hessian, showing that the trajectory oscillates perpetually along the dominant eigenvector. Furthermore, we analyze the transverse stability of this limit cycle, proving that perturbations along sub-dominant eigenspaces decay contractively at a rate governed by an analytical amplification factor. To resolve this fundamental bottleneck, we analyze Gradient-Scaled SAM (GS-SAM), showing that scaling the perturbation with the gradient norm restores linear convergence to the exact global optimum. Our theoretical predictions are validated via numerical simulations to machine precision.
Prabesh Dahal (Sun,) studied this question.