What does this research mean for the field?

MDPENet enhances prototype representation and generalization for few-shot semantic segmentation, outperforming classical methods on benchmark datasets. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to improve the accuracy of few-shot semantic segmentation by enhancing prototype representation and generalization.

February 22, 2026Open Access

MDPENet: multimodal-driven prototype evolving network for few-shot semantic segmentation

Key Points

The aim is to improve the accuracy of few-shot semantic segmentation by enhancing prototype representation and generalization.
Developed a novel Multimodal-Driven Prototype Evolving Network (MDPENet)
Included Support Feature Enhancement Module (SFEM) for multimodal feature interactions
Designed Query Feature Disentanglement Module (QFDM) to reduce semantic interference
Implemented Prototype Evolution Module (PEM) for refining prototype sets
MDPENet shows improved segmentation accuracy over classical few-shot semantic segmentation methods
Demonstrated effectiveness on benchmark datasets like PASCAL-5 i and COCO-20 i
Reduced prototype bias and better utilized multimodal information

Abstract

Few-shot semantic segmentation (FSS) aims to predict segmentation masks for unseen objects using only a limited number of annotated samples. Among various approaches, prototype learning has been widely adopted in FSS, where prototype vectors derived from seen categories (support images) are transferred to novel categories (query images) to guide the segmentation of unseen objects. Although prototype-based methods have achieved considerable progress, they still suffer from prototype bias and insufficient utilization of limited multimodal information. To address these issues, we propose a Multimodal-Driven Prototype Evolving Network (MDPENet), designed to enhance prototype representation and generalization. The proposed network primarily consists of three modules: the Support Feature Enhancement Module (SFEM), the Query Feature Disentanglement Module (QFDM), and the Prototype Evolution Module (PEM). Specifically, the SFEM establishes multimodal feature interactions between the text label features encoded by Contrastive Language-Image Pre-training (CLIP) and the separated support foreground features, thereby enhancing the representational quality and robustness of the support features. The QFDM then integrates the CLIP-encoded text label features with the support foreground features to disentangle the whole query feature, effectively reducing semantic interference among mixed query representations. Finally, the PEM evolves and refines the prototype set using the enhanced support and disentangled query foreground features at a fine-grained level. Extensive experiments on the benchmark datasets PASCAL-5 i and COCO-20 i demonstrate the superiority of our MDPENet compared to classical FSS methods.

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper