In this paper, we first propose MoE-Adapters, a parameter-efficient training framework to alleviate long-term forgetting issues in incremental learning with Vision-Language Models (VLM). Our MoE-Adapters leverages incrementally added routers to activate and integrate exclusive expert adapters from a pre-defined static expert set, enabling the pre-trained CLIP to efficiently adapt to new tasks. To preserve the zero-shot capability of VLM, a Distribution Discriminative Auto-Selector (DDAS) is introduced that automatically routes in-distribution and out-of-distribution inputs to the MoE-Adapters and the original CLIP, respectively. However, relying on a static expert set and a separate distribution selector can lead to parameter redundancy and increased training complexity. In response, we further extend an MoE-Adapters++ framework by introducing dynamic MoE-adapters, which allows experts to be adaptively involved during the continual learning process. Additionally, a Latent Embedding Auto-Selector (LEAS) is proposed that incorporates distribution selection within CLIP to create a more unified architecture. Extensive experiments across diverse settings demonstrate that the proposed method consistently surpasses previous state-of-the-art approaches while concurrently improving training efficiency.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jiazuo Yu
Zichen Huang
Yunzhi Zhuge
IEEE Transactions on Pattern Analysis and Machine Intelligence
Tsinghua University
Dalian University of Technology
University of Electronic Science and Technology of China
Building similarity graph...
Analyzing shared references across papers
Loading...
Yu et al. (Wed,) studied this question.
www.synapsesocial.com/papers/68a3633d0a429f7973329f0c — DOI: https://doi.org/10.1109/tpami.2025.3597942