Language-guided supervision, which utilizes a frozen semantic target from a Pretrained Language Model (PLM), has emerged as a promising paradigm for visual Continual Learning (CL). However, relying on a single target introduces two critical limitations: 1) semantic ambiguity, where a polysemous category name results in conflicting visual representations, and 2) intra-class visual diversity, where a single prototype fails to capture the rich variety of visual appearances within a class. To this end, we propose MuproCL, a novel framework that replaces the single target with multiple, context-aware prototypes. Specifically, we employ a lightweight LLM agent to perform category disambiguation and visual-modal expansion to generate a robust set of semantic prototypes. A LogSumExp aggregation mechanism allows the vision model to adaptively align with the most relevant prototype for a given image. Extensive experiments across various CL baselines demonstrate that MuproCL consistently enhances performance and robustness, establishing a more effective path for language-guided continual learning.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xiwei Liu
Yulong Li
Yichen Li
Building similarity graph...
Analyzing shared references across papers
Loading...
Liu et al. (Fri,) studied this question.
www.synapsesocial.com/papers/68e040eda99c246f578b33c8 — DOI: https://doi.org/10.48550/arxiv.2509.16011