Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Large language models (LLMs) have been shown to perform well at a variety of syntactic, discourse, and reasoning tasks. While LLMs are increasingly deployed in many forms including conversational agents that interact with humans, we lack a grounded benchmark to measure how well LLMs understand social language. Here, we introduce a new theory-driven benchmark, SocKET, that contains 58 NLP tasks testing social knowledge which we group into five categories: humor & sarcasm, offensiveness, sentiment & emotion, and trustworthiness. In tests on the benchmark, we demonstrate that current models attain only moderate performance but reveal significant potential for task transfer among different types and categories of tasks, which were predicted from theory. Through zero-shot evaluations, we show that pretrained models already possess some innate but limited capabilities of social language understanding and training on one category of tasks can improve zero-shot testing on others. Our benchmark provides a systematic way to analyze model performance on an important dimension of language and points to clear room for improvement to build more socially-aware LLMs. The resources are released at https://github.com/minjechoi/SOCKET.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Minje Choi

Stanford University

Jiaxin Pei

Palo Alto University

Sagar Kumar

Boston Children's Hospital

Actions

Institutions

University of Michigan

University of Cambridge

Northeastern University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Choi et al. (Sun,) studied this question.

synapsesocial.com/papers/6a089a00afa0a1b8dbde00cd — DOI: https://doi.org/10.18653/v1/2023.emnlp-main.699

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

An indirect measure of discrete emotions.· 2019 · 10 citations
Language as context for the perception of emotion· 2007 · 698 citations
A Large Self-Annotated Corpus for Sarcasm· 2017 · 134 citations
The Development of Social Knowledge. Morality and Convention· 1985 · 2,689 citations
Current Emotion Research in the Language Sciences· 2012 · 277 citations

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

An indirect measure of discrete emotions.· 2019 · 10 citations
Language as context for the perception of emotion· 2007 · 698 citations
A Large Self-Annotated Corpus for Sarcasm· 2017 · 134 citations
The Development of Social Knowledge. Morality and Convention· 1985 · 2,689 citations
Current Emotion Research in the Language Sciences· 2012 · 277 citations

Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider