What question did this study set out to answer?

This research aims to develop a stable framework for multitext image editing that allows real-time editing using a single model.

February 24, 2023Open Access

Where you edit is what you get: Text-guided image editing with region-based attention

Key Points

This research aims to develop a stable framework for multitext image editing that allows real-time editing using a single model.
Introduced a novel framework for multitext image editing with stable training.
Implemented a region-based attention mechanism for spatially-localized editing.
Conducted experiments primarily focusing on the face domain.
Achieved real-time interaction during image editing with multi-text prompts.
Enabled high-quality sequential editing and regional style transfer.
Demonstrated effective spatial disentanglement, minimizing changes to irrelevant areas.

Abstract

• A novel framework is proposed which enables stable training of multitext image editing within one model without the need for per-sample or per-prompt optimization. • A region-based attention mechanism is adopted to ensure spatially-localized editing. • With the help of these designs, real-time interaction is enabled and several practical applications such as sequential editing can be achieved in high-quality. Leveraging the abundant knowledge learned from pre-trained multi-modal models like CLIP has recently proved to be effective for text-guided image editing. Though convincing results have been made when combining the image generator StyleGAN with CLIP, most methods need to train separate models for different prompts, and irrelevant regions are often changed after editing due to the lack of spatial disentanglement. We propose a novel framework that can edit different images according to different prompts in one model. Besides, an innovative region-based spatial attention mechanism is adopted to explicitly guarantee the locality of editing. Experiments mainly in the face domain verify the feasibility of our framework and show that when multi-text editing and local editing are accomplishable, our method can complete practical applications like sequential editing and regional style transfer.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper