Recently, the application of large language models (LLMs) in psychology has gained increasing attention. However, their psychological competence still requires further investigation. This study explores this issue through the lens of Chinese psychological knowledge question answering (QA). Specifically, we constructed a dedicated dataset based on Chinese qualification examinations for psychological counselors and psychotherapists. Subsequently, we evaluated dense, Mixture-of-Expert, and reasoning LLMs with varying parameter sizes and evaluation modes in the Chinese context, measuring answer accuracy in both closed-ended and open-ended settings. The experimental results showed that the larger and more recent LLMs achieved higher accuracy in psychological QA. While few-shot learning led to improvements in accuracy, Chain-of-Thought prompting and reasoning LLMs provided only limited gains. Notably, LLMs achieved higher accuracy in closed-ended settings than in open-ended ones. Furthermore, error analysis indicated that LLMs can produce incorrect or hallucinated responses, primarily due to insufficient psychological knowledge and conceptual confusion. Although current LLMs show promise in psychological QA tasks, users should remain cautious about over-reliance on their responses. A complementary, human-AI collaborative approach is recommended for practical use.
Gao et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: