What question did this study set out to answer?

Evaluate the semantic diversity of large language models compared to human divergent thinking capabilities.

January 23, 2026Open Access

Divergent creativity in humans and large language models

ABAntoine Bellemare-PepinKairos (United States)FLFrançois LespinasseKairos (United States)PTPhilipp ThölkeUniversité de Montréal

Key Points

Evaluate the semantic diversity of large language models compared to human divergent thinking capabilities.
Analyzed creative outputs from state-of-the-art LLMs and a dataset of 100,000 humans.
Utilized the Divergent Association Task and various creative-writing tasks with objective scoring.
Varying linguistic strategy prompts and temperature for LLMs to assess impact on performance.
LLMs exceeded average human scores on the Divergent Association Task.
LLMs approached but did not surpass the mean creativity scores of highly creative humans.
Top performing LLMs were still outperformed by the upper half of human participants.

Abstract

Abstract The recent surge of Large Language Models (LLMs) has led to claims that they are approaching a level of creativity akin to human capabilities. This idea has sparked a blend of excitement and apprehension. However, a critical piece that has been missing in this discourse is a systematic evaluation of LLMs’ semantic diversity, particularly in comparison to human divergent thinking. To bridge this gap, we leverage recent advances in computational creativity to analyze semantic divergence in both state-of-the-art LLMs and a substantial dataset of 100,000 humans. These divergence-based measures index associative thinking—the ability to access and combine remote concepts in semantic space—an established facet of creative cognition. We benchmark performance on the Divergent Association Task (DAT) and across multiple creative-writing tasks (haiku, story synopses, and flash fiction), using identical, objective scoring. We found evidence that LLMs can surpass average human performance on the DAT, and approach human creative writing abilities, yet they remain below the mean creativity scores observed among the more creative segment of human participants. Notably, even the top performing LLMs are still largely surpassed by the aggregated top half of human participants, underscoring a ceiling that current LLMs still fail to surpass. We also systematically varied linguistic strategy prompts and temperature, observing reliable gains in semantic divergence for several models. Our human-machine benchmarking framework addresses the polemic surrounding the imminent replacement of human creative labor by AI, disentangling the quality of the respective creative linguistic outputs using established objective measures. While prompting deeper exploration of the distinctive elements of human inventive thought compared to those of AI systems, we lay out a series of techniques to improve their outputs with respect to semantic diversity, such as prompt design and hyper-parameter tuning.

KI fragen

Bookmark

View Full Paper