Key points are not available for this paper at this time.
As the adoption of large language models increases and the need for per-user or per-task model customization grows, the parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA) and its variants, incur substantial storage and transmission costs. To further reduce stored parameters, we introduce a "divide-and-share" paradigm that breaks the barriers of low-rank decomposition across matrix dimensions, modules and layers by sharing parameters globally via a vector bank. As an instantiation of the paradigm to LoRA, our proposed VB-LoRA composites all the low-rank matrices of LoRA from a shared vector bank with a differentiable top-k admixture module. VB-LoRA achieves extreme parameter efficiency while maintaining comparable or better performance compared to state-of-the-art PEFT methods. Extensive experiments demonstrate the effectiveness of VB-LoRA on natural language understanding, natural language generation, and instruction tuning tasks. When fine-tuning the Llama2-13B model, VB-LoRA only uses 0. 4\% of LoRA's stored parameters yet attaining superior results. Our source code is available at https: //github. com/leo-yangli/VB-LoRA.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yang et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68e68be2b6db643587613446 — DOI: https://doi.org/10.48550/arxiv.2405.15179
Li Yang
Shaobo Han
Shihao Ji
Building similarity graph...
Analyzing shared references across papers
Loading...