What question did this study set out to answer?

The aim is to create a better flatness indicator for optimizing neural networks to enhance generalization.

March 15, 2026

Bilateral Sharpness-Aware Minimization for Flatter Minima

Key Points

The aim is to create a better flatness indicator for optimizing neural networks to enhance generalization.
Proposed bilateral sharpness-aware minimization (BSAM) by merging max-sharpness and min-sharpness.
Analyzed the current limitations of sharpness-aware minimization (SAM).
Conducted extensive experiments to compare BSAM with vanilla SAM across multiple tasks.
BSAM achieves superior generalization performance compared to traditional SAM.
Demonstrated lower Hessian eigenvalues resulting in flatter minima.
Reported improved robustness across various tasks including classification and semantic segmentation.

Abstract

Sharpness-aware minimization (SAM) enhances generalization by minimizing max-sharpness (MaxS). Despite its practical success, we empirically found that the MaxS behind SAM's generalization enhancements faces the "flatness indicator problem" (FIP), where SAM only considers the flatness in the direction of gradient ascent. This leads to high Hessian eigenvalues for the deep neural network (DNN), indicating insufficient flatness in the solution region. Abetter flatness indicator (FI) would lower these Hessian eigenvalues, resulting in a flatter minimum and improved generalization of the network. Because SAM is inherently a greedy search method. In this article, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote asmin-sharpness (MinS). By merging MaxS and MinS, we create a better FI that indicates a flatter direction during the optimization. Specifically, we combine this FI with SAM into the proposed bilateral SAM (BSAM), which finds a flatter minimum than SAM. The theoretical analysis demonstrates that BSAM converges to a local minimum. Extensive experiments demonstrate that BSAM offers superior generalization performance and robustness compared to vanilla SAM across various tasks, i.e.,classification, transfer learning, human pose estimation, semantic segmentation, and network quantization.

Bookmark

Bilateral Sharpness-Aware Minimization for Flatter Minima

Key Points

Abstract

Cite This Study