What question did this study set out to answer?

The primary aim is to create a framework that synthesizes realistic defect images to balance data for automated inspection.

April 12, 2026Open Access

CLARIS: Control-based language-guided realistic imperfection synthesis

Key Points

The primary aim is to create a framework that synthesizes realistic defect images to balance data for automated inspection.
Developed CLARIS, combining natural language processing with 3D constraints.
Utilized a Vision-Language Model to generate tailored text prompts and defect masks.
Applied ControlNet to maintain physical consistency of synthesized defects.
Employed Textual Inversion and Low-Rank Adaptation for efficient learning of defect characteristics.
Achieved an average Kernel Inception Distance of 11.07.
Inception Score reached 1.63, indicating good image quality.
Intra-cluster pairwise LPIPS distance was measured at 0.27, supporting defect identification.

Abstract

Automated vision inspection is vital in modern manufacturing, but advanced processes with high yield rates cause a severe data imbalance: abundant normal data and scarce defective data. To overcome this, we propose CLARIS (Control-based Language-guided Realistic Imperfection Synthesis), a novel framework combining natural language semantic flexibility with 3D structural constraints to generate physically consistent, high-quality defect images. CLARIS utilizes a Vision-Language Model (VLM) to interpret user instructions and input images, dynamically generating tailored text prompts and defect masks. Subsequently, ControlNet ensures the synthesized defects adhere to the object's physical shape and surface curvature by explicitly applying normal maps as constraints. Furthermore, Textual Inversion (TI) and Low-Rank Adaptation (LoRA) are employed to efficiently learn and reflect the unique visual characteristics of specific defects using minimal parameters. Evaluated on the 15 categories of the MVTec Anomaly Detection (MVTec AD) dataset, the framework achieved an average Kernel Inception Distance (KID) of 11.07, Inception Score (IS) of 1.63, and intra-cluster pairwise LPIPS distance (IC-LPIPS) of 0.27.

Bookmark

View Full Paper

Bookmark

View Full Paper

CLARIS: Control-based language-guided realistic imperfection synthesis

Key Points

Abstract

Cite This Study