Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute-object compositions by leveraging prior knowledge of known primitives. However, real-world visual features of attributes and objects are often entangled, causing distribution shifts between seen and unseen combinations. Existing methods often ignore intrinsic variations and interactions among primitives, leading to poor feature discrimination and biased predictions. To address these challenges, we propose Multi-level Contextual Prototype Modulation (MCPM), a transformer-based framework with a hierarchical structure that effectively integrates attributes and objects to generate richer visual embeddings. At the feature level, we apply contrastive learning to improve discriminability across compositional tasks. At the prototype level, a subclass-driven modulator captures fine-grained attribute-object interactions, enabling better adaptation to long-tail distributions. Additionally, we introduce a Minority Attribute Enhancement (MAE) strategy that synthesizes virtual samples by mixing attribute classes, further mitigating data imbalance. Experiments on four benchmark datasets (MIT-States, C-GQA, UT-Zappos, and VAW-CZSL) show that MCPM brings significant performance improvements, verifying its effectiveness in complex composition scenes.
Liu et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: