Key points are not available for this paper at this time.
Text-to-image generation aims to generate images from text descriptions. Its main challenge lies in two aspects: (1) Semantic consistency, i. e. , the generated images should be semantically consistent with the input text; and (2) Visual reality, i. e. , the generated images should look like real images. To ensure text-image consistency, existing works mainly learn to establish the cross-modal representations via a text encoder and image encoder. However, due to the limited representation capability of the fixed-length embeddings and the flexibility of the free-form text descriptions, the learned text-to-image model is incapable of maintaining the semantic consistency between image local regions and fine-grained descriptions. As a result, the generated images sometimes miss some fine-grained attributes of the generated object, such as the color or shape of a part of the object. To address this issue, this paper proposes a Local Feature Refinement Based Generative Adversarial Network (LFR-GAN), which first divides the text into some independent fine-grained attributes and generates an initial image, then refines the image details based on these attributes. The main contributions are three-fold: (1) An attribute modeling approach is proposed to model the fine-grained text descriptions by mapping them into representations of independent attributes, which provides more fine-grained details for image generation. (2) A local feature refinement approach is proposed to enable the generated image to form a complete reflection of the fine-grained attributes contained in the text description. (3) A multi-stage generation approach is proposed to realize the fine-grained manipulation of complex images progressively, which aims to improve the performance of the refinement and generate photo-realistic images. Extensive experiments on the CUB and Oxford102 datasets show the effectiveness of our LFR-GAN approach in both text-to-image generation and text-guided image manipulation tasks. Our LFR-GAN approach shows superior performance compared to the state-of-the-art methods. The codes will be released at https: //github. com/PKU-ICST-MIPL/LFR-GANTOMM2023.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zijun Deng
Shanghai University
Xiangteng He
King University
Yuxin Peng
Guiyang College of Traditional Chinese Medicine
ACM Transactions on Multimedia Computing Communications and Applications
Peking University
Building similarity graph...
Analyzing shared references across papers
Loading...
Deng et al. (Thu,) studied this question.
synapsesocial.com/papers/6a0eb66f06ecbe833447bab6 — DOI: https://doi.org/10.1145/3589002