March 18, 2024Open Access

Paste and Harmonize via Denoising: Subject-Driven Image Editing with Frozen Pre-Trained Diffusion Model

Key Points

Key points are not available for this paper at this time.

Abstract

Text-to-Image generative models have shown a remarkable ability to produce high-quality images. However, existing methods still face difficulties in exemplar-guided image editing without destroying the given objects' identity in the exemplar image. To address this problem, we propose a new framework called Paste and Harmonize via Denoising, which leverages pre-trained diffusion models to facilitate the text-driven transfer of objects from an exemplar image to the edited image while preserving their appearance and characteristics. The framework consists of two main steps: paste and harmonize via denoising. In the paste step, an off-the-shelf text-driven model is utilized to localize the objects in the exemplar image. The editing task is naturally transformed into an image harmonization task by pasting the object patches into the edited image. In the harmonize via denoising step, we introduce an image harmonization module based on pre-trained diffusion models to blend the inserted object with the target image, producing a coherent and realistic image without compromising synthesis quality and preserving the text-driven style transfer editing ability. In the experiments, the qualitative comparisons with baselines demonstrate that our method achieves impressive performance in exemplar-based image editing on both training and in-the-wild images with high fidelity. More qualitative and quantitative results can be found at our website.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper