What question did this study set out to answer?

February 26, 2026Open Access

Localizing Perceptual Artifacts in Synthetic Images for Image Quality Assessment via Deep-Learning-Based Anomaly Detection

Key Points

The aim is to develop a method for localizing perceptual artifacts in synthetic images using minimal supervision.
Proposed Mask-based Semantic Rejection (MSR) mechanism within a semantic segmentation architecture.
Utilized a 'one-vs-all' property of object queries to detect regions failing to match semantic categories.
Adopted a flexible adaptation strategy for zero-shot inference and fine-tuning with minimal supervision.
Achieved mIoU improvements of 6.52% and 13.06% on text-to-image tasks using 10% and 50% labeled samples, respectively.
Demonstrated significant performance superiority over existing methods, especially in data-efficient scenarios.

Abstract

While deep generative models, such as text-to-image diffusion, demonstrate strong capabilities in synthesizing photorealistic images, they frequently produce perceptual artifacts (e.g., distorted structures or unnatural textures) that require manual correction. Existing artifact localization methods typically rely on fully supervised training with large-scale pixel-level annotations, which suffer from high labeling costs. To address these challenges, we propose a novel framework based on the core insight that perceptual artifacts can be fundamentally modeled as “semantic outliers”—regions that inherently fail to match any pre-defined semantic categories. Instead of learning specific artifact features, we introduce a Mask-based Semantic Rejection (MSR) mechanism within a semantic segmentation architecture. This mechanism leverages the “one-vs-all” property of object queries to identify regions that are consistently rejected by all pre-trained semantic categories. Furthermore, we design a flexible adaptation strategy that supports both zero-shot inference using pre-trained semantic knowledge and fine-tuning with a margin-based suppression objective to explicitly optimize the rejection boundary using minimal supervision. Comprehensive experiments across 11 synthesis tasks demonstrate that MSR significantly outperforms state-of-the-art methods, particularly in data-efficient scenarios. Specifically, the framework achieves mIoU improvements of 6.52% and 13.06% on the text-to-image task using only 10% and 50% of labeled samples, respectively, underscoring its superior capability.

Localizing Perceptual Artifacts in Synthetic Images for Image Quality Assessment via Deep-Learning-Based Anomaly Detection

Key Points

Abstract

Cite This Study