Zero-Shot Anomaly Detection (ZSAD) aims to generalize to unseen domains without requiring any target-domain samples, and is commonly applied to address the cold-start problem in industrial settings. However, existing ZSAD methods usually fail to capture anomaly semantics in specific contexts, limiting their cross-domain generalization. Furthermore, vision-language models (VLMs) typically show limited sensitivity to subtle anomalous patterns, making it difficult to explicitly guide VLM-based ZSAD model to focus on anomalies. To address these issues, this paper proposes a framework integrating anomaly-aware prompt learning and feature adaptation, which consists of two key components: an anomaly-aware textual prompt module and multi-scale feature adapters. Specifically, we dynamically incorporate the intrinsic local and global anomaly semantics from test images into textual prompts to achieve deep alignment between visual and textual modalities. To focus on defects at different levels, we introduce adapters to aggregate multi-scale visual features, thus enhancing fine-grained anomaly perception. Furthermore, the framework is extended to the few-shot setting to effectively leverage limited target-domain samples. Extensive experiments on 15 industrial and medical datasets demonstrate that the proposed method achieves state-of-the-art (SOTA) ZSAD performance. Notably, the method can further significantly improve the performance under few-shot settings, indicating its strong application potential.
Li et al. (Tue,) studied this question.