What question did this study set out to answer?

The research aims to enhance text-driven image editing by generating diverse facial images using latent space manipulation.

April 1, 2026Open Access

Natural Text-Driven, Multi-Attribute Editing of Facial Images with Robustness in Sparse Latent Space

Key Points

The research aims to enhance text-driven image editing by generating diverse facial images using latent space manipulation.
Utilized pre-trained models CLIP and StyleGAN2.
Edited latent codes in the StyleGAN latent space based on text inputs.
Focused on editing multiple facial attributes simultaneously in sparse regions.
Achieved improved image diversity in generated facial images based on text guidance.
Demonstrated robustness in the manipulation of multiple attributes in the latent space.

Abstract

Due to the development of GAN and the proposal of many excellent models like StyleGAN, text-driven image editing and image generation have made great progress in recent years, but the task of generating diverse images of specific people under the guidance of text is still lacking. This paper combines two pre-training models, CLIP and StyleGAN2, to conduct a preliminary exploration of the above tasks. The latent code of the input portrait is driven to be edited and manipulated in the StyleGAN latent space via a CLIP-based text-driven module. Especially in the sparse region of the generator latent space, and when editing multiple attributes at the same time, some good results have finally been achieved.

Bookmark

View Full Paper

Cite This Study

Jianpeng Zou (Fri,) studied this question.

synapsesocial.com/papers/69cd7b065652765b073a8bd6 https://doi.org/https://doi.org/10.15002/00026277

Bookmark

View Full Paper