What question did this study set out to answer?

The study aims to improve the generalization of vision-language models (VLMs) by addressing atypical samples in visual feature spaces.

April 4, 2026

Distribution-Aware Prompt Learning for Vision-Language Models with Dynamic Boundary Prototype

Key Points

The study aims to improve the generalization of vision-language models (VLMs) by addressing atypical samples in visual feature spaces.
Proposed dynamic boundary prototype to highlight ambiguous samples.
Introduced Boundary-Centroid Pulling to optimize intra-class distributions.
Developed a distance-weighted contrastive loss for better inter-class separability.
Applied Low-Rank Adaptation Fine-Tuning to modify self-attention layers in the vision encoder.
Utilized a progressive training strategy for stable optimization.
Consistent improvement in average performance across 11 benchmark datasets.
Enhanced structural consistency within each class was achieved.
Better fine-grained discrimination between adjacent classes was observed.

Abstract

Prompt learning has emerged as an effective strategy for adapting vision-language models (VLMs) which injects learnable semantic prompts into VLMs to guide the alignment between visual and textual representations. Although existing methods have shown strong performance across various tasks, they usually focus on the representative class-level samples and overlook the atypical and hard samples in visual feature space, which hinders generalization of VLMs. To address this issue, we propose the concept of dynamic boundary prototype, which highlights ambiguous samples that are far from the class centroid and is updated at each epoch. Accordingly, we propose a Distribution-Aware Prompt Learning (DAPL) framework to calibrate the distribution of visual feature space via the definition, optimization, and updating of dynamic boundary prototypes. Firstly, we introduce Boundary-Centroid Pulling to optimize the intra-class distribution by progressively reducing the distance between boundary and centroid prototypes, thereby enhancing structural consistency within each class. Secondly, to further enhance inter-class separability, a distance-weighted contrastive loss that places greater emphasis on distinguishing adjacent classes is designed, facilitating more effective fine-grained discrimination. Thirdly, we apply Low-Rank Adaptation Fine-Tuning to adapt the vision encoder through targeted modifications to its self-attention layers. Additionally, we adopt a progressive training strategy for stable optimization. DAPL is compatible with mainstream prompt learning methods such as CoOp, CoCoOp and PromptKD, and consistently improves their average performance across 11 benchmark datasets.

Bookmark

Cite This Study

Yang et al. (Thu,) studied this question.

synapsesocial.com/papers/69d0ae68659487ece0fa4577 https://doi.org/https://doi.org/10.1109/tip.2026.3678014

Bookmark