What question did this study set out to answer?

The study aims to propose a framework for effective open-set domain adaptation using pre-trained vision-language models without additional training.

May 2, 2026

Training-Free Open-Set Domain Adaptation with Vision-Language Models.

Key Points

The study aims to propose a framework for effective open-set domain adaptation using pre-trained vision-language models without additional training.
Introduced Semantic-guided Target Adaptation (SemTA) for training-free open-set domain adaptation.
Implemented an unknown semantic discovery module to label unknown classes using cluster centroids.
Employed a dual sample attention mechanism for improved inference.
Achieved state-of-the-art performance across four benchmarks without the need for training.
Showed enhanced interpretability compared to other open-set methods that rely on confidence thresholds.

Abstract

With the prevalence of pre-trained vision-language models like CLIP, leveraging the generic knowledge embedded in CLIP for domain adaptation has proved to be a promising direction. However, most existing CLIP-based methods are limited to closed-set settings. This is primarily because CLIP needs the semantic labels of unknown classes for inference, thus making it not applicable to Open-Set Domain Adaptation (OSDA). To utilize the complementary roles of CLIP and the source model, our paper proposes a novel Semantic-guided Target Adaptation (SemTA) framework for OSDA in a training-free manner. Specifically, we introduce an unknown semantic discovery module. It uses the cluster centroids of the target data to obtain the semantic labels of unknown classes from the worldwide corpus. Then, the semantic-based inference can be performed with CLIP. Additionally, the dual sample attention mechanism is implemented to output sample-based inference. Representative features from both the source model and CLIP serve as the key to improve task specificity. Compared to previous OSDA methods which reject unknown data by confidence threshold, the proposed approach is more practical and offers better interpretability. Comprehensive evaluations on four benchmarks reveal our method sets a new state-of-the-art even without training. Our code will be publicly available soon.

AIに質問

Bookmark

AIに質問

Bookmark

Training-Free Open-Set Domain Adaptation with Vision-Language Models.

Key Points

Abstract

Cite This Study