What question did this study set out to answer?

This survey reviews instruction-based image editing advancements and evaluates various models.

March 8, 2026Open Access

Instruction-based image editing: a survey on data, models, evaluation, and applications

Key Points

This survey reviews instruction-based image editing advancements and evaluates various models.
Analyzed task definitions and categories of editing operations
Reviewed methodologies for data construction
Examined architectural evolution including diffusion models
Assessed standardized evaluation metrics and benchmarks
Introduced the CDD-IIE benchmark for performance assessment.
Identified critical technological milestones in model evolution
Highlighted strengths and limitations of open-source solutions.

Abstract

Abstract Instruction-based Image Editing (IIE) aims to transform a given image into a new one based on textual instructions. Advances in Large Language Models (LLMs) and Vision-Language Models (VLMs) have accelerated progress toward practical “one-sentence image editing” systems. This survey presents a systematic taxonomy and comprehensive review of IIE research, structured around five core dimensions: (1) task definition and hierarchical categorization of editing operations, (2) methodologies for training data construction, (3) architectural evolution from GAN-based to diffusion and autoregressive paradigms, (4) standardized evaluation metrics and benchmark development, and (5) introduction of commercial solutions. Our analysis shows critical technological milestones across model generations. We further propose a Comprehensive, in-Depth, and Diagnostic benchmark for IIE task (CDD-IIE Bench), which can rigorously assess the multiple aspects of model performance. Through empirical comparisons of open-source solutions, we highlight their respective capabilities and limitations. Finally, we discuss future research directions to advance the field.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper