Key points are not available for this paper at this time.
Although motivated by the adaptation of text-to-speech synthesis models, we argue that more generic parameter-efficient fine-tuning (PEFT) is an appropriate framework to do such adaptation. However, catastrophic forgetting remains an issue with PEFT, damaging the pre-trained model's inherent capabilities. We demonstrate that existing Bayesian learning techniques can be applied to PEFT to prevent catastrophic forgetting as long as the parameter shift of the fine-tuned layers can be calculated differentiably. In a principled series of experiments on language modeling and speech synthesis tasks, we utilize established Laplace approximations, including diagonal and Kronecker factored approaches, to regularize PEFT with the low-rank adaptation (LoRA) and compare their performance in pre-training knowledge preservation. Our results demonstrate that catastrophic forgetting can be overcome by our methods without degrading the fine-tuning performance, and using the Kronecker factored approximations produces a better preservation of the pre-training knowledge than the diagonal ones.
Building similarity graph...
Analyzing shared references across papers
Loading...
Chen et al. (Mon,) studied this question.
synapsesocial.com/papers/68e78a60b6db6435876fcd85 — DOI: https://doi.org/10.48550/arxiv.2402.12220
Haolin Chen
Shantou University
Philip N. Garner
Idiap Research Institute
Building similarity graph...
Analyzing shared references across papers
Loading...