February 23, 2024Open Access

Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer

Key Points

Key points are not available for this paper at this time.

Abstract

Fine-tuning large language models (LLMs) with classic first-order optimizers entails prohibitive GPU memory due to the backpropagation process. Recent works have turned to zeroth-order optimizers for fine-tuning, which save substantial memory by using two forward passes. However, these optimizers are plagued by the heterogeneity of parameter curvatures across different dimensions. In this work, we propose HiZOO, a diagonal Hessian informed zeroth-order optimizer which is the first work to leverage the diagonal Hessian to enhance zeroth-order optimizer for fine-tuning LLMs. What's more, HiZOO avoids the expensive memory cost and only increases one forward pass per step. Extensive experiments on various models (350M~66B parameters) indicate that HiZOO improves model convergence, significantly reducing training steps and effectively enhancing model accuracy. Moreover, we visualize the optimization trajectories of HiZOO on test functions, illustrating its effectiveness in handling heterogeneous curvatures. Lastly, we provide theoretical proofs of convergence for HiZOO. Code is publicly available at https://anonymous.4open.science/r/HiZOO27F8.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Zhao et al. (Fri,) studied this question.

synapsesocial.com/papers/68e77f50b6db6435876f2d95 — DOI: https://doi.org/10.48550/arxiv.2402.15173

Authors

Yanjun Zhao

Nanjing University of Science and Technology

Sizhe Dang

Haishan Ye

Chinese University of Hong Kong

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion