Key points are not available for this paper at this time.
Semantic image synthesis aims to generate images from given semantic layouts, which is a challenging task that requires training models to capture the relationship between layouts and images. Previous works are usually based on Generative Adversarial Networks (GAN) or autoregressive (AR) models. However, the GAN model's training process is unstable, and the AR model’s performance is seriously affected by the independent image encoder and the unidirectional generation bias. Due to the above limitations, these methods tend to synthesize unrealistic, poorly aligned images and only consider single-style image generation. In this paper, we propose a Multi-model Style-aware Diffusion Learning (MSDL) framework for semantic image synthesis, including a training module and a sampling module. In the training module, a layout-to-image model is introduced to transfer the learned knowledge from a model pretrained with massive weak correlated text-image pairs data, making the training process more efficient. In the sampling module, we designed a map-guidance technique and creatively designed a multi-model style-guidance strategy for creating images in multiple styles, e.g., oil painting, Disney Cartoon, and pixel style. We evaluate our method on Cityscapes, ADE20K, and COCO-Stuff, making visual comparisons and computing with multiple metrics such as FID, LPIPS, etc. Experimental results demonstrate that our model is highly competitive, especially in terms of fidelity and diversity.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yunfang Niu
Beijing Academy of Artificial Intelligence
Lingxiang Wu
University of Technology Sydney
Yufeng Zhang
ACM Transactions on Multimedia Computing Communications and Applications
Chinese Academy of Sciences
University of Chinese Academy of Sciences
Institute of Automation
Building similarity graph...
Analyzing shared references across papers
Loading...
Niu et al. (Fri,) studied this question.
synapsesocial.com/papers/68e5d9f4b6db6435875702d7 — DOI: https://doi.org/10.1145/3686155