Key points are not available for this paper at this time.
ABSTRACT In recent years, prompt learning has shown promise in transferring pretrained vision‐language models (VLMs) to downstream tasks. However, existing methods face two challenges in improving generalisation: (1) When leveraging the collaborative effect of multimodal prompts, it is often assumed that text and visual modalities share the same prompt requirements, neglecting the distinct hierarchical processing of their encoders, leading to prompt imbalance; and (2) current methods exhibit limited adaptability when facing diverse distribution shift scenarios, including class distribution shifts and image content variations. To address these challenges, we propose a diversified composite prompt learning (DCPL) framework that integrates unified and specific prompts. Specifically, to alleviate multimodal prompt imbalance, we design a shared root multimodal prompting strategy, which employs a shared root prompt and an independent derivation mechanism to generate the derived multimodal prompt (DMP), enabling independent deep prompting while maintaining implicit synergy across modalities. Furthermore, we design a dual‐branch dynamic adaptive prompting strategy that produces the derived class‐specific prompt (DCP) and image‐specific prompt (ISP), driven by inter‐class relations and image‐patch context, respectively, to enhance adaptability across different distribution shifts. Extensive experiments on base‐to‐novel, cross‐dataset, domain generalisation and few‐shot learning demonstrate that the DCPL achieves superior performance, validating its robustness and generalisation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xiaoyong Mei
Chong Tang
Dai Zhengqun
CAAI Transactions on Intelligence Technology
Johns Hopkins University
Sun Yat-sen University
Zhejiang Normal University
Building similarity graph...
Analyzing shared references across papers
Loading...
Mei et al. (Wed,) studied this question.
www.synapsesocial.com/papers/6a06b983e7dec685947ac3c7 — DOI: https://doi.org/10.1049/cit2.70143