Zero-shot anomaly detection (ZSAD) is a challenging task that aims to detect anomalies in images without any prior knowledge of the anomaly classes. This task is especially difficult because anomalies are rare, diverse, and often manifest differently across domains, making it hard for models to generalize when training data is scarce or unavailable. Recently, vision-language models (VLMs), such as CLIP, have shown great potential in ZSAD, but they often struggle to adapt to unseen domains due to the lack of domain-aware knowledge. To address these challenges, we propose the Domain Adaptation CLIP (DA-CLIP), a novel approach that adapts domain-aware knowledge to the VLM. Specifically, DACLIP leverages a Domain-Aware Knowledge Adaptation (DAKA) strategy to enhance CLIP for ZSAD across different domains. The DAKA strategy comprises multiple experts that specialize in target domains, enabling the model to dynamically select and combine specialized experts tailored to anomaly characteristics, thus improving its ability to generalize and detect a wide range of anomalies. Furthermore, we introduce learnable domain-aware prompts that are jointly learned by and injected into both the CLIP encoders (visual and text) and the DAKA modules. This dual-pathway learning enables the model to capture domain-specific features at multiple levels of the architecture, allowing for more effective adaptation to new domains and anomaly types.We evaluate our approach on several benchmark datasets spanning industrial and medical domains. Extensive experiments demonstrate that DACLIP consistently outperforms state-of-the-art methods in ZSAD, achieving significant improvements in both image-level and pixel-level anomaly detection tasks.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zeqi Ma
Xin Fang
Yue Huang
IEEE Transactions on Image Processing
Harbin Institute of Technology
Guangdong University of Technology
Cloud Computing Center
Building similarity graph...
Analyzing shared references across papers
Loading...
Ma et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69b64c67b42794e3e660da98 — DOI: https://doi.org/10.1109/tip.2026.3671665