Prompt learning is a kind of popular methods for continual learning, by training a tiny collection of parameters based on frozen networks pre-trained on large-scale datasets, to adapt the model to sequential tasks. Most of prompt methods encounters serious dependence on the delicately designed prompt pools and lead to two shortcomings: prompt inconsistency between training and inference, and prompt selection mismatch during inference. Benefiting from the uniqueness of language description and powerful vision transformers, we propose a T ext- P rompted P rompt Generator Net work (TPPNet), which designs a text-prompted prompt (TPP) generator by encapsulating the pre-trained text embeddings into the visual class token, yielding versatile TPP for resisting the shortcomings of previous methods. Typically, the versatile TPP exhibits three properties: (a) Expressiveness : TPPNet generates prompts with are prompted by text prompts, leveraging the uniqueness of semantics of language and improving the intra-task expressiveness of TPP; (b) Inter-task compatibleness : TPP equips the visual image token with the text embeddings of both old and current tasks, and absorbs old knowledge to shrink the prompt inconsistency and improve its anti-forgetting ability; (c) prompt-query avoidance : TPPNet avoids the prompt query process by generating instance-level prompts, and effectively handles the prompt selection mismatch issue during inference. We conduct experiments on four datasets, and the results show that TPPNet outperforms or is comparable with the state-of-the-art-methods for rehearsal-free class-incremental learning tasks.
Wang et al. (Mon,) studied this question.