August 21, 2024Open Access

Human-AI Collaboration Supporting GPT-4o Achieving Human-Level User Feedback in Emotional Support Conversations: Integrative Modeling and Prompt Engineering Approaches (Preprint)

YHYinghui HuangWuhan University of Technology LLLie LiNanyang Technological University WDWanghao Dong

Key Points

Key points are not available for this paper at this time.

Abstract

BACKGROUND Emotional support plays a crucial role in enhancing social interactions, facilitating psychological interventions, and improving customer service outcomes by addressing individuals' emotional needs. The emergence of large language models (LLMs) holds promise for delivering emotional support on a large scale, but their effectiveness compared to human counselors is still not well understood. Evaluating and enhancing the emotional support capabilities of LLMs through targeted user-centered strategies is crucial for their successful real-world integration. OBJECTIVE This study aims to evaluate the emotional support capabilities of large language models (LLMs), specifically GPT-4o, and to introduce an integrative automatic evaluation framework centered on user-perceived feedback. The framework is designed to enhance LLM performance in emotional support conversations (ESCs) by identifying psycholinguistic clues as intrinsic evaluation metrics and leveraging a customized Chain-of-Thought (CoT) prompting framework. METHODS The study used a dataset of emotional support conversations from human counselors. An explanatory predictive model was developed using explainable artificial intelligence methods, following an integrative modeling paradigm rooted in computational social science. The model evaluated and interpreted user-perceived feedback scores for GPT-4o. Additionally, the study integrated Hill’s three-stage model of helping into a manually customized chain of thought prompting framework to systematically evaluate GPT-4o's performance in ESCs. RESULTS GPT-4o achieved high user-perceived feedback scores, demonstrating relative stability in its performance, but it still significantly trails behind human counselors overall (Cliff's Delta = 0.087, p CONCLUSIONS This study provides preliminary evidence of GPT-4o's emotional support capabilities and proposes a user-perceived feedback-centered integrative evaluation framework for ESCs. The findings suggest a cautiously optimistic outlook for the application of advanced large language models (LLMs) in emotional support services, although significant challenges remain, particularly in enhancing the depth of exploration in conversations and the personalization of language. The proposed framework encourages the integration of human expertise into LLMs, enhancing their efficacy and advancing the development of trustworthy AI-based emotional support services.

Ask AI

Helpful

Bookmark

View Full Paper