The affective connotations of words are central to meaning and important predictors of many social processes. As such, understanding the degree to which commercially-available generative language models (LLMs) replicate human judgements of affective connotations may help better understand human-model interactions. LLMs may also serve as useful tools for researchers seeking affective meaning estimates. We test the ability of three LLMs – GPT-4o, Mistral Large, and Llama 3.1 – to estimate human affective connotation ratings of words representing social identities, behaviours, modifiers, and settings in three language cultures: English (US), French (France), and German (Germany). We find that LLM ratings of terms correlate strongly with human ratings. However, their ratings tend to be overly extreme and patterns of correlations between meaning dimensions only loosely approximate those of human ratings. Consistent with previous findings of English-language and American biases in LLMs, we find that LLMs tend to perform better on English terms, though this pattern varies somewhat by meaning dimension and the type of term in question. We explore how LLMs might contribute to scholarship on affective connotations – by acting as tools for measurement – and how scholarship on affective connotations might contribute to generative language models – by guiding exploration of model biases.
Building similarity graph...
Analyzing shared references across papers
Loading...
Combs et al. (Sun,) studied this question.
synapsesocial.com/papers/68ed1896f29694dd1da78ac0 — DOI: https://doi.org/10.1080/02699931.2025.2568551
Aidan Combs
The Ohio State University
Diego Dametto
University of Potsdam
Christophe Blaison
Université Paris Cité
Cognition & Emotion
Duke University
Université Paris Cité
The Ohio State University
Building similarity graph...
Analyzing shared references across papers
Loading...