March 30, 2023

LFR-GAN: Local Feature Refinement based Generative Adversarial Network for Text-to-Image Generation

Key Points

Key points are not available for this paper at this time.

Abstract

Text-to-image generation aims to generate images from text descriptions. Its main challenge lies in two aspects: (1) Semantic consistency, i. e. , the generated images should be semantically consistent with the input text; and (2) Visual reality, i. e. , the generated images should look like real images. To ensure text-image consistency, existing works mainly learn to establish the cross-modal representations via a text encoder and image encoder. However, due to the limited representation capability of the fixed-length embeddings and the flexibility of the free-form text descriptions, the learned text-to-image model is incapable of maintaining the semantic consistency between image local regions and fine-grained descriptions. As a result, the generated images sometimes miss some fine-grained attributes of the generated object, such as the color or shape of a part of the object. To address this issue, this paper proposes a Local Feature Refinement Based Generative Adversarial Network (LFR-GAN), which first divides the text into some independent fine-grained attributes and generates an initial image, then refines the image details based on these attributes. The main contributions are three-fold: (1) An attribute modeling approach is proposed to model the fine-grained text descriptions by mapping them into representations of independent attributes, which provides more fine-grained details for image generation. (2) A local feature refinement approach is proposed to enable the generated image to form a complete reflection of the fine-grained attributes contained in the text description. (3) A multi-stage generation approach is proposed to realize the fine-grained manipulation of complex images progressively, which aims to improve the performance of the refinement and generate photo-realistic images. Extensive experiments on the CUB and Oxford102 datasets show the effectiveness of our LFR-GAN approach in both text-to-image generation and text-guided image manipulation tasks. Our LFR-GAN approach shows superior performance compared to the state-of-the-art methods. The codes will be released at https: //github. com/PKU-ICST-MIPL/LFR-GANTOMM2023.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Zijun Deng

Shanghai University

Xiangteng He

King University

Yuxin Peng

Guiyang College of Traditional Chinese Medicine

Journals

ACM Transactions on Multimedia Computing Communications and Applications

Actions

Institutions

Peking University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

LFR-GAN: Local Feature Refinement based Generative Adversarial Network for Text-to-Image Generation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study