What question did this study set out to answer?

The aim is to develop an algorithm that addresses the limitations of traditional stochastic gradient descent methods in deep learning.

February 22, 2026Open Access

Preconditioned inexact stochastic ADMM for deep models

Key Points

The aim is to develop an algorithm that addresses the limitations of traditional stochastic gradient descent methods in deep learning.
Developed the PISA algorithm grounded in Lipschitz continuity assumptions.
Enables scalable parallel computing and supports various preconditions.
Introduced two variants, SISA and NSISA, incorporating second-order information.
SISA and NSISA showed superior performance over existing optimizers in training diverse deep models.
Effective handling of data heterogeneity was demonstrated in various experimental settings.

Abstract

Abstract Deep learning models are usually trained with stochastic gradient descent-based algorithms, but these optimizers face inherent limitations, such as slow convergence and stringent assumptions for convergence. In particular, data heterogeneity arising from distributed settings poses significant challenges to their theoretical and numerical performance. Here we develop an algorithm called PISA (preconditioned inexact stochastic alternating direction method of multipliers). Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz continuity of the gradient on a bounded region, thereby removing the need for other conditions commonly imposed by stochastic methods. This capability enables the proposed algorithm to tackle the challenge of data heterogeneity effectively. Moreover, the algorithmic architecture enables scalable parallel computing and supports various preconditions, such as second-order information, second moment and orthogonalized momentum by Newton–Schulz iterations. Incorporating the last two preconditions in PISA yields two computationally efficient variants: SISA and NSISA. Comprehensive experimental evaluations for training or fine-tuning diverse deep models, including vision models, large language models, reinforcement learning models, generative adversarial networks and recurrent neural networks, demonstrate superior numerical performance of SISA and NSISA compared with various state-of-the-art optimizers.

Bookmark

View Full Paper

Cite This Study

Zhou et al. (Fri,) studied this question.

synapsesocial.com/papers/699a9d3c482488d673cd2f71 https://doi.org/https://doi.org/10.1038/s42256-026-01182-3

Bookmark

View Full Paper