This study introduces a novel, human-centered framework for evaluating the holistic intelligence of large language models (LLMs), using behavioral theory and experimental benchmarks drawn from human intelligence. Through extensive online experiments, the framework reveals that GPT-4 outperforms humans in cognitive, emotional, and creative intelligence, but falls short in social intelligence, especially in social interest, self-efficacy, and understanding mental states. Beyond theoretical insight, the study validates this framework by assessing GPT-4’s impact across diverse job roles, finding results consistent with established labor market research. It also offers a reusable tool for firms and policymakers to evaluate LLM intelligence and forecast job-level impacts. This enables informed decisions about where and how to integrate LLMs, match models to specific job requirements, and identify risks in socially intensive roles. The framework provides a foundation for responsible LLM deployment, ensuring alignment with human-centered structures and supporting strategic workforce planning.
Wang et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: