What question did this study set out to answer?

The aim is to enhance zero-shot anomaly detection by integrating foundation models to capture fine-grained semantics.

February 14, 2026Open Access

DCS: A Zero-Shot Anomaly Detection Framework with DINO-CLIP-SAM Integration

Key Points

The aim is to enhance zero-shot anomaly detection by integrating foundation models to capture fine-grained semantics.
Developed DCS, a unified framework integrating Grounding DINO, CLIP, and SAM.
Introduced FinePrompt for adaptive learning and fine-grained exception description.
Designed Adaptive Dual-path Cross-modal Interaction module for effective information exchange.
Implemented Box-Point Prompt Combiner to improve segmentation results.
DCS achieved state-of-the-art performance on MVTec-AD and VisA datasets.
Demonstrated significant improvement in identifying minor defects and exceptions.

Abstract

Recently, the progress of foundation models such as CLIP and SAM has shown the great potential of zero-shot anomaly detection tasks. However, existing methods usually rely on general descriptions such as “abnormal”, and the semantic coverage is insufficient, making it difficult to express fine-grained anomaly semantics. In addition, CLIP primarily performs global-level alignment, and it is difficult to accurately locate minor defects, while the segmentation quality of SAM is highly dependent on prompt constraints. In order to solve these problems, we proposed DCS, a unified framework that integrates Grounding DINO, CLIP and SAM through three key innovations. First of all, we introduced FinePrompt for adaptive learning, which significantly enhanced the modeling ability of exception semantics by building a fine-grained exception description library and adopting learnable text embeddings. Secondly, we have designed an Adaptive Dual-path Cross-modal Interaction (ADCI) module to achieve more effective cross-modal information exchange through dual-path fusion. Finally, we proposed a Box-Point Prompt Combiner (BPPC), which combines box prior information provided by DINO with the point prompt generated by CLIP, so as to guide SAM to generate finer and more complete segmentation results. A large number of experiments have proved the effectiveness of our method. On the MVTec-AD and VisA datasets, DCS has achieved the most state-of-the-art zero-shot anomaly detection results.

Bookmark

View Full Paper

Bookmark

View Full Paper

DCS: A Zero-Shot Anomaly Detection Framework with DINO-CLIP-SAM Integration

Key Points

Abstract

Cite This Study