As general-purpose AI systems become increasingly integrated into society for tasks such as information retrieval, content generation, problem-solving, text analysis, coding, and automation, it is crucial to assess their long-term impact on humans. This research explores sentiment of Large Language Models (LLMs) and humans towards Artificial General Intelligence (AGI). The methodology adopted was a Likert scale survey. Seven LLMs, including GPT-4 and Bard, were analyzed and compared against sentiment data from three independent human sample populations. Temporal variations in sentiment were also evaluated over three consecutive days. The results highlighted a diversity in sentiment scores among LLMs, ranging from 3.32 to 4.12 out of 5. GPT-4 recorded the most positive sentiment score towards AGI, while Bard leaned towards a neutral sentiment. The human samples showed a lower average sentiment of 2.97. The study’s analysis outlines potential conflicts of interest and biases in the sentiment formation of LLMs. Results indicate that LLMs could subtly influence societal perceptions of various opinions formed within the LLMs. To address the need for regulatory oversight and culturally-grounded assessments of AI systems, we introduce the Societal AI Alignment Benchmark (SAIA), which leverages multidimensional prompts and empirically validated societal value frameworks to evaluate language model outputs across temporal, model, and multilingual axes. This benchmark is designed to guide policymakers and AI agencies by providing robust, actionable insights into AI alignment with human values, public sentiment, and ethical norms at both national and international levels. Future research should refine the operationalization of the SAIA benchmark and systematically evaluate its effectiveness through empirical testing.
Bojic et al. (Thu,) studied this question.