What question did this study set out to answer?

This research aims to improve the detection of generative image forgeries using a novel multi-modal framework.

May 1, 2026

FTGID: Fine-grained Text-driven Framework for Universal Generative Image Detection.

Key Points

This research aims to improve the detection of generative image forgeries using a novel multi-modal framework.
Developed the Layer-wise Adaptive Global Extractor (LAGE) for adaptive global representation stabilization.
Designed the Fine-grained Text-guided Local Enhancer (FTLE) for patch-level text-visual interactions.
Introduced the High-frequency Artifact Feature Extractor (HAFE) to identify subtle generative artifacts.
FTGID outperformed existing generative image detection methods across various models and datasets.
Achieved superior performance in robustness and interpretability of generative image detection.
Demonstrated reliable detection of subtle artifacts through adaptive high-frequency cue extraction.

Abstract

The rapid progress of generative models has made detecting realistic forgeries a critical challenge for security and trust. Existing image and frequency-based methods depend on dataset-specific artifacts with poor generalization, while Vision-Language Model (VLM)-based methods remain limited by coarse prompts and underused cross-modal alignment. To address these issues, we propose a Fine-grained Text-driven Generative Image Detection (FTGID) framework, which enables comprehensive detection through multi-modal cues. First, we design a Layer-wise Adaptive Global Extractor (LAGE) that stabilizes multi-level global representations through adaptive CLS token fusion with lightweight calibration and parameter-efficient tuning. Second, we propose a Fine-grained Text-guided Local Enhancer (FTLE) that performs patch-level text-visual interaction to enhance the localization of forgery-relevant regions. Third, we introduce a High-frequency Artifact Feature Extractor (HAFE) that adaptively captures discriminative high-frequency cues, enabling more reliable detection of subtle generative artifacts. Extensive experiments demonstrate that FTGID consistently outperforms state-of-the-art GID methods across diverse generative models and unseen datasets, achieving superior performance, thereby enhancing both robustness and interpretability in open-world generative image detection. Our codes will be made publicly available after the peer review process.

Ask AI

Mark Helpful

Bookmark

Relay