What type of study is this?

October 19, 2025Open Access

Reasoning-based LLMs surpass average human performance on medical social skills

Key Points

LLMs achieved a score of 97.5% on social skills questions, surpassing average human performance.
The best performing LLM, o1, answered 39 out of 40 correctly, indicating high accuracy in complex scenarios.
This evaluation used forty USMLE-style questions targeting various social skills categories, ensuring comprehensive assessment.
These findings underscore LLMs' potential to enhance clinical training and patient care through robust social skill responses.

Abstract

Abstract A significant portion of medical licensing examinations assesses key social skills such as communication, ethics, and professionalism, which are vital for quality patient care. Artificial intelligence (AI) has been increasingly integrated into healthcare systems in recent years, raising concerns among regulators, providers, and patients regarding AI’s capacity to handle complex, human-centered scenarios. Previous work has shown that large language models (LLMs) like GPT-3.5 and GPT-4 perform well on social skills questions from the United States Medical Licensing Examination (USMLE). However, newer models like GPT-4o, Gemini 1.5 Pro, and o1 have been introduced, with the latter designed to mimic human thinking through a “chain of thought” reasoning, unlike other LLMs that provide instantaneous answers. The impact of reasoning on LLMs’ ability to navigate scenarios requiring social skills remains unclear. Here, we evaluate five LLMs: GPT-4, GPT-4o, Gemini 1.5 Pro, and o1-preview, and its full version, o1; using forty USMLE-style social skills questions from the UWORLD question bank covering several categories: communication changing answers frequently, primarily to incorrect ones, reduced its overall ranking from second to fourth. This phenomenon was not observed in any other model, including the final o1 release, which maintained consistent, high-level performance. These findings, along with prior work, highlight the potential of LLMs to demonstrate effectiveness at answering knowledge-based social skills questions in a medical context, sometimes surpassing average human performance. As LLMs continue to grow in size and sophistication, their performance is expected to improve further. In particular, the strong performance of reasoning-based LLMs suggests that such architectures hold significant promise for advancing AI’s role in socially oriented tasks. These results demonstrate the growing potential for reasoning-based LLMs to complement and enhance clinical training, medical education, and patient care.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Khalid Ibraheem Alohali

Laura Asaad Almusaeeb

Abdulaziz Abdulrahman Almubarak

Journals

Scientific Reports

Actions

Institutions

King Saud University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Reasoning-based LLMs surpass average human performance on medical social skills

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study