Key points are not available for this paper at this time.
Catastrophic forgetting emerges as a critical challenge when fine-tuning multi-modal large language models (MLLMs), where improving performance on unseen tasks often leads to a significant performance drop on the original tasks. This paper presents a comprehensive analysis of catastrophic forgetting in MLLMs and introduces a post-training adjustment method called Model Tailor. Our method primarily preserves the pre-trained parameters while replacing a small number (10\%) of fine-tuned parameters, maintaining 99\% effectiveness on original tasks versus pre-training, and achieving 97\% on new tasks compared to standard fine-tuning. Specifically, we derive a sparse mask to identify the "model patch", based on a fusion strategy that integrates salience and sensitivity analysis. Subsequently, a compensation mechanism is introduced to "decorate the patch", enhancing the model's performance on both target and original tasks. Additionally, our method is adaptable to multi-task scenarios. Through extensive experiments on InstructBLIP and LLaVA-1. 5 in both image captioning and visual question answering tasks, our approach demonstrates significant task adaptability while preserving inherent pre-trained capabilities.
Building similarity graph...
Analyzing shared references across papers
Loading...
Didi Zhu
Nanjing Tech University
Zhongyi Sun
Hainan University
Zexi Li
Northeastern University
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhu et al. (Mon,) studied this question.
synapsesocial.com/papers/68e78968b6db6435876fbd60 — DOI: https://doi.org/10.48550/arxiv.2402.12048
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: