What type of study is this?

August 19, 2025Open Access

Evaluating Human-Like Qualities in Language Models

Key Points

LLaMA 3.2 outperformed human responses in personality and creativity, demonstrating strong communication traits.
Five communication traits—naturalness, empathy, creativity, adaptability, and humor—were evaluated across seven language models.
A multi-trait rubric and assessments by five human raters were employed for the evaluation process.
Significant variation in performance across systems implies that conversational quality may depend more on tuning than model size.

Abstract

This paper investigates the human-like communication abilities of modern language models, comparing several open-source and proprietary systems. As LLMs are increasingly deployed in socially interactive roles—ranging from digital companions to mental health support tools—their ability to engage users naturally and expressively has become a critical yet underexplored dimension of evaluation. Traditional benchmarks tend to emphasize accuracy or reasoning, but they fail to capture the nuanced, subjective traits that define human conversation. To address this, seven LLMs were tested using both short and sustained dialogues, evaluated by five human raters using a multi-trait rubric. LLaMA 3.2 emerged as a standout, occasionally outperforming human responses in personality and creativity. Models were assessed on five human-oriented communication traits: naturalness, empathy, creativity, adaptability, and humor/personality. Results show significant variation across systems, with some matching or exceeding human performance in specific areas—suggesting that conversational quality may depend more on tuning and stylistic freedom than model scale alone.

Ask AI

Helpful

Bookmark

View Full Paper