The stochastic variance-reduced gradient (SVRG) theory is particularly well-suited for addressing gradient variance in deep neural network (DNN) training; however, its direct application to DNN training is hindered by adaptation challenges. To tackle this issue, the present paper proposes a series of strategies focused on adaptive alternating learning rates to effectively adapt SVRG for DNN training. Firstly, within the outer loop of SVRG, both the full gradient and the learning rate specific to DNN training are computed. For two distinct formulas used for calculating the learning rate, an alternating strategy is introduced that employs them alternately across iterations. This approach allows for simultaneous provision of diverse guidance information regarding parameter change rates and gradient change rates during DNN weight updates. Additionally, a threshold method is utilized to correct the learning rate into an appropriate range, thereby accelerating convergence. Secondly, in the inner loop of SVRG, DNN weights are updated using mini-batch average gradient along with the proposed learning rate. Concurrently, mini-batch average gradients from each iteration within the inner loop are refined and aggregated into a single gradient exhibiting reduced variance through an inertia strategy. This refined gradient is then relayed back to the outer loop to recalculate the new learning rate. The efficacy of the proposed algorithm has been validated on models including LeNet, VGG11, ResNet34, and DenseNet121 while being compared against several classic and advanced optimizers. Experimental results demonstrate that the proposed algorithm exhibits remarkable training robustness across DNN models with diverse characteristics. In terms of training convergence, the proposed algorithm demonstrates competitiveness with state-of-the-art algorithms, such as Lion, developed by the Google Brain team.
Zou et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: