Abstract Instruction-based Image Editing (IIE) aims to transform a given image into a new one based on textual instructions. Advances in Large Language Models (LLMs) and Vision-Language Models (VLMs) have accelerated progress toward practical “one-sentence image editing” systems. This survey presents a systematic taxonomy and comprehensive review of IIE research, structured around five core dimensions: (1) task definition and hierarchical categorization of editing operations, (2) methodologies for training data construction, (3) architectural evolution from GAN-based to diffusion and autoregressive paradigms, (4) standardized evaluation metrics and benchmark development, and (5) introduction of commercial solutions. Our analysis shows critical technological milestones across model generations. We further propose a Comprehensive, in-Depth, and Diagnostic benchmark for IIE task (CDD-IIE Bench), which can rigorously assess the multiple aspects of model performance. Through empirical comparisons of open-source solutions, we highlight their respective capabilities and limitations. Finally, we discuss future research directions to advance the field.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xianghao Zang
Zijian Jiang
J. Cheng
Vicinagearth.
China Telecom (China)
China Telecom
Building similarity graph...
Analyzing shared references across papers
Loading...
Zang et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69acc59c32b0ef16a40502f7 — DOI: https://doi.org/10.1007/s44336-026-00034-3
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: