What type of study is this?

September 10, 2025Open Access

Evaluation of StyleGAN-CLIP Models in Text-to-Image Generation of Faces

Key Points

Both text-to-image generation and editing models based on stylegan2 produce high-quality face images.
Evaluation showed that automatic metrics only weakly correlate with human ratings of image quality.
Using a combination of stylegan and clip models enables improved results in text-to-image generation tasks.
The analysis highlights the capabilities of styles in generated faces, while indicating limitations of automated assessments.

Abstract

In this paper, we explore the generation of face images conditioned on a textual description, as well as the capabilities of the models in editing a machine-generated image on the basis of additional text prompts. We leverage open source state-of-the-art face image generators, StyleGAN models and couple these with the open source multimodal embedding space, CLIP, in an optimisation loop using the method in StyleCLIP to set up our experimental system. We make use of automatic metrics and human ratings to evaluate the results and, in addition, obtain insight into how much automatic metrics are correlated with human ratings. We found compelling evidence that both the text-to-image and editing models based on StyleGAN2 stand out as the better options. In addition, the automatic evaluation metrics are only weakly correlated with human ratings.

Read Full Paperexternally

AI से पूछें

Bookmark

View Full Paper