Conjugate gradient (CG) and second-order information (SOI) receive increasing interest due to their crucial role in improving stochastic first-order (SFO) algorithms for solving machine learning problems. Although a variety of stochastic CG (SCG) and stochastic second-order (SSO) algorithms were studied, nearly all works only focus on either empirical or theoretical aspects for specific tasks by using SFO algorithms with CG or SOI separately. Thus, it is desired to explore the effect of the combination of CG and SOI on SFO algorithms. To cope with this issue, we develop a type of fast and low-cost SSO conjugate gradient (S2CG) algorithms, extending the family of SCG and SSO algorithms. Prior to our work,second-order CG methods only apply CG to solve the suboptimization problem, such as the (quasi-)Newton equation, which is time-consuming for large-scale optimization problems or problems requiring high-precision solutions. We show that the transition from SCG to SSO is driven by the conjugate coefficient with SOI. Moreover, we provide theoretical guarantees of different S2CG algorithms for strongly convex, general convex, and Polyak-Łojasiewicz (PL) cases. Finally, extensive experiments on different machine learning tasks, covering support vector machine (SVM), logistic regression (LR), federated learning (FL), and neural networks (NNs), significantly validate the superiority and robustness of S2CG algorithms over state-of-the-art SFO, SCG, and SSO algorithms.
Zhuang Yang (Thu,) studied this question.