What question did this study set out to answer?

This research aims to evaluate the sentiment of Large Language Models (LLMs) and humans regarding Artificial General Intelligence (AGI).

April 26, 2026Open Access

Towards a societal AI alignment benchmark for evaluating human–machine value convergence

Key Points

This research aims to evaluate the sentiment of Large Language Models (LLMs) and humans regarding Artificial General Intelligence (AGI).
Conducted a Likert scale survey to assess sentiment.
Analyzed sentiment from seven LLMs including GPT-4 and Bard against three independent human populations.
Evaluated temporal variations in sentiment over three consecutive days.
LLMs showed sentiment scores ranging from 3.32 to 4.12, with GPT-4 scoring highest.
Human samples recorded a lower average sentiment of 2.97.
Potential biases in LLM sentiment formation were identified, suggesting influence on societal perceptions.

Abstract

As general-purpose AI systems become increasingly integrated into society for tasks such as information retrieval, content generation, problem-solving, text analysis, coding, and automation, it is crucial to assess their long-term impact on humans. This research explores sentiment of Large Language Models (LLMs) and humans towards Artificial General Intelligence (AGI). The methodology adopted was a Likert scale survey. Seven LLMs, including GPT-4 and Bard, were analyzed and compared against sentiment data from three independent human sample populations. Temporal variations in sentiment were also evaluated over three consecutive days. The results highlighted a diversity in sentiment scores among LLMs, ranging from 3.32 to 4.12 out of 5. GPT-4 recorded the most positive sentiment score towards AGI, while Bard leaned towards a neutral sentiment. The human samples showed a lower average sentiment of 2.97. The study’s analysis outlines potential conflicts of interest and biases in the sentiment formation of LLMs. Results indicate that LLMs could subtly influence societal perceptions of various opinions formed within the LLMs. To address the need for regulatory oversight and culturally-grounded assessments of AI systems, we introduce the Societal AI Alignment Benchmark (SAIA), which leverages multidimensional prompts and empirically validated societal value frameworks to evaluate language model outputs across temporal, model, and multilingual axes. This benchmark is designed to guide policymakers and AI agencies by providing robust, actionable insights into AI alignment with human values, public sentiment, and ethical norms at both national and international levels. Future research should refine the operationalization of the SAIA benchmark and systematically evaluate its effectiveness through empirical testing.

Mark Helpful

Bookmark

Relay

View Full Paper