While recent research suggests Large Language Models match human creative performance in divergent thinking tasks, visual creativity remains underexplored. This study compared image generation in human participants (Visual Artists and Non-Artists) and using an image-generation AI model (two prompting conditions with varying human input: high for Human-Inspired, low for Self-Guided). The creativity of the resulting images was evaluated by human raters (N = 255) and GPT-4o acting as an AI rater under two conditions: strictly mirroring the human rating task and using in-context learning with human-rated examples as guidance. We observed a clear creativity gradient: Visual Artists > Non-Artists ≥ Human-Inspired GenAI > Self-Guided GenAI. Increased human guidance strongly improved GenAI's creative output, bringing its productions close to those of Non-Artists. Moreover, while Guided-GPT-4o more closely approximated human creativity judgment patterns, baseline GPT-4o (without guidance) exhibited markedly different creativity evaluations, showing reduced discrimination between image categories and inflated scores for GenAI outputs. These results suggest that, in contrast to language-centered tasks, GenAI models may face unique challenges in visual domains, where creativity depends on perceptual nuance and contextual sensitivity, distinctly human capacities that may not be readily transferable from language models.
Rondini et al. (Tue,) studied this question.