August 2, 2024Open Access

Multi-model Style-aware Diffusion Learning for Semantic Image Synthesis

Key Points

Key points are not available for this paper at this time.

Abstract

Semantic image synthesis aims to generate images from given semantic layouts, which is a challenging task that requires training models to capture the relationship between layouts and images. Previous works are usually based on Generative Adversarial Networks (GAN) or autoregressive (AR) models. However, the GAN model's training process is unstable, and the AR model’s performance is seriously affected by the independent image encoder and the unidirectional generation bias. Due to the above limitations, these methods tend to synthesize unrealistic, poorly aligned images and only consider single-style image generation. In this paper, we propose a Multi-model Style-aware Diffusion Learning (MSDL) framework for semantic image synthesis, including a training module and a sampling module. In the training module, a layout-to-image model is introduced to transfer the learned knowledge from a model pretrained with massive weak correlated text-image pairs data, making the training process more efficient. In the sampling module, we designed a map-guidance technique and creatively designed a multi-model style-guidance strategy for creating images in multiple styles, e.g., oil painting, Disney Cartoon, and pixel style. We evaluate our method on Cityscapes, ADE20K, and COCO-Stuff, making visual comparisons and computing with multiple metrics such as FID, LPIPS, etc. Experimental results demonstrate that our model is highly competitive, especially in terms of fidelity and diversity.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yunfang Niu

Beijing Academy of Artificial Intelligence

Lingxiang Wu

University of Technology Sydney

Yufeng Zhang

Journals

ACM Transactions on Multimedia Computing Communications and Applications

Actions

Institutions

Chinese Academy of Sciences

University of Chinese Academy of Sciences

Institute of Automation

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Multi-model Style-aware Diffusion Learning for Semantic Image Synthesis

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study