What question did this study set out to answer?

The aim is to improve the quality and control of advertisement image layouts and object representations in generated visuals.

June 27, 2026Open Access

Transformer-based advertisement image layout generation and object fidelity optimization

Key Points

The aim is to improve the quality and control of advertisement image layouts and object representations in generated visuals.
Developed a two-stage framework called LAYOBJ-GAN for layout-controllable image synthesis.
Introduced a Transformer-based sequence-to-sequence layout generator that captures long-range dependencies.
Proposed a TL-Norm module for object appearance modulation based on textual context and layout constraints.
LAYOBJ-GAN significantly outperforms seven state-of-the-art methods in image quality metrics, layout controllability, and semantic object accuracy.
Achieved substantial enhancements on complex scene images compared to previous frameworks.

Abstract

The layout of the image and the positional distribution of the objects directly affect the audience’s visual focus and the effect of the message conveyed. By setting up a specific layout structure, the viewer’s attention can be more focused on the product launched or the moral of the advertisement. Existing work related to position-controllable text-to-image generation has made great progress in generating results on simple images. However, when generating images of complex scenes, the quality is often poor. This can result in the model failing to accurately convey the message of the advertisement when used to generate advertisements. To address these limitations, we propose LAYOBJ-GAN, a novel two-stage framework for layout-controllable advertisement image synthesis. Unlike prior work, our method explicitly models background layouts jointly with object layouts during the text-to-layout generation stage, enabling comprehensive spatial planning of complex scenes. Technically, we introduce a Transformer-based sequence-to-sequence layout generator that learns long-range dependencies between textual descriptions and both background and object regions, which has not been explored in previous advertisement-oriented text-to-image frameworks. In the layout-to-image stage, we further propose a fine-grained text–layout interaction normalization module (TL-Norm) that enables object knowledge transfer from a pre-trained category-to-image model, allowing object appearance to be adaptively modulated by textual context and layout constraints. Extensive experiments on MS-COCO and a high-definition advertisement dataset (AsHQ-10K) demonstrate that LAYOBJ-GAN significantly outperforms seven state-of-the-art methods in image quality, layout controllability, and semantic object accuracy. These results confirm the effectiveness of explicitly modeling background layouts and transferring object-level generative knowledge for complex advertisement image synthesis.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper