What question did this study set out to answer?

This research aims to improve zero-shot anomaly detection by integrating domain-aware knowledge into vision-language models.

March 15, 2026

Adapting Domain-Aware Knowledge to Vision-language Model for Zero-shot Anomaly Detection

Key Points

This research aims to improve zero-shot anomaly detection by integrating domain-aware knowledge into vision-language models.
Developed Domain Adaptation CLIP (DA-CLIP) for enhancing vision-language models in detecting anomalies.
Implemented a Domain-Aware Knowledge Adaptation (DAKA) strategy with multiple domain-specific experts.
Introduced learnable domain-aware prompts for both visual and text encoders to improve feature capture.
Conducted evaluations on various benchmark datasets from industrial and medical contexts.
DA-CLIP consistently outperformed state-of-the-art methods in zero-shot anomaly detection.
Significant improvements were observed in image-level and pixel-level anomaly detection tasks.
Enhanced model generalization capabilities across different domains and anomaly types.

Abstract

Zero-shot anomaly detection (ZSAD) is a challenging task that aims to detect anomalies in images without any prior knowledge of the anomaly classes. This task is especially difficult because anomalies are rare, diverse, and often manifest differently across domains, making it hard for models to generalize when training data is scarce or unavailable. Recently, vision-language models (VLMs), such as CLIP, have shown great potential in ZSAD, but they often struggle to adapt to unseen domains due to the lack of domain-aware knowledge. To address these challenges, we propose the Domain Adaptation CLIP (DA-CLIP), a novel approach that adapts domain-aware knowledge to the VLM. Specifically, DACLIP leverages a Domain-Aware Knowledge Adaptation (DAKA) strategy to enhance CLIP for ZSAD across different domains. The DAKA strategy comprises multiple experts that specialize in target domains, enabling the model to dynamically select and combine specialized experts tailored to anomaly characteristics, thus improving its ability to generalize and detect a wide range of anomalies. Furthermore, we introduce learnable domain-aware prompts that are jointly learned by and injected into both the CLIP encoders (visual and text) and the DAKA modules. This dual-pathway learning enables the model to capture domain-specific features at multiple levels of the architecture, allowing for more effective adaptation to new domains and anomaly types.We evaluate our approach on several benchmark datasets spanning industrial and medical domains. Extensive experiments demonstrate that DACLIP consistently outperforms state-of-the-art methods in ZSAD, achieving significant improvements in both image-level and pixel-level anomaly detection tasks.

اسأل الذكاء الاصطناعي

Bookmark

اسأل الذكاء الاصطناعي

Bookmark

Adapting Domain-Aware Knowledge to Vision-language Model for Zero-shot Anomaly Detection

Key Points

Abstract

Cite This Study