What question did this study set out to answer?

The aim is to improve performance in continual video instance segmentation by maintaining knowledge stability while adapting to new categories.

March 15, 2026

CRISP: Contrastive Residual Injection and Semantic Prompting for Continual Video Instance Segmentation

Key Points

The aim is to improve performance in continual video instance segmentation by maintaining knowledge stability while adapting to new categories.
Introduced Contrastive Residual Injection and Semantic Prompting framework.
Developed instance correlation loss for instance-wise learning.
Created adaptive residual semantic prompts and a query-prompt matching mechanism for category-wise learning.
Implemented a semantic consistency loss to ensure coherence in incremental training.
Established a strong initialization strategy to maintain task correlation.
CRISP significantly outperforms previous methods in continual segmentation tasks.
Effectively prevents catastrophic forgetting of learned categories.
Improves overall segmentation and classification performance in testing on YouTube-VIS datasets.

Abstract

Continual video instance segmentation (CVIS) requires the plasticity to absorb new categories while maintaining the stability to retain previously learned knowledge. Crucially, the model must also preserve temporal consistency of instances across video frames. In this work, we introduce Contrastive Residual Injection and Semantic Prompting (CRISP), a framework tailored to address instance-wise, category-wise, and task-wise confusion in CVIS. For instance-wise learning, we model instance tracking and construct instance correlation loss, which emphasizes the correlation with the prior query space while strengthening the specificity of the current task query. For category-wise learning, we build an adaptive residual semantic prompt (ARSP) learning framework, which constructs a learnable semantic residual prompt pool generated by category text and uses an adjustive query-prompt matching mechanism to build a mapping relationship between the query of the current task and the semantic residual prompt. Meanwhile, a semantic consistency loss based on the contrastive learning is introduced to maintain semantic coherence between object queries and residual prompts during incremental training. For task-wise learning, to ensure the correlation at the inter-task level within the query space, we introduce a concise yet powerful initialization strategy for incremental prompts. Extensive experiments on YouTube-VIS-2019 and YouTube-VIS-2021 datasets demonstrate that CRISP significantly outperforms existing continual segmentation methods in the long-term continual video instance segmentation task, avoiding catastrophic forgetting and effectively improving segmentation and classification performance. The code is available at https://github.com/01upup10/CRISP.

KI fragen

Bookmark

KI fragen

Bookmark

CRISP: Contrastive Residual Injection and Semantic Prompting for Continual Video Instance Segmentation

Key Points

Abstract

Cite This Study