This study explores how users understand moral risks in healthcare generative AI chatbots (HGACs) and evaluates whether large language models (LLMs) can effectively simulate these human perceptions. Grounded in moral foundations theory and the coping model of user adaptation, we employ a three-stage mixed-methods design comparing LLM-simulated and human respondents. Stage 1 interviews identified five primary risks: health disinformation, bias and discrimination, privacy data leak, unclear accountability, and malicious guidance. Subsequent PLS-SEM and artificial neural network analyses examined linear and non-linear behavioral relationships. Results indicate that while LLMs achieve qualitative performance comparable to humans (66.7% accuracy, 61.5% recall), they underperform in quantitative contexts due to repetitive responses, incomplete responses, confusing responses, and short-lived prompts. Our findings provide a nuanced understanding of HGACs moral risks and delineate the boundaries of LLMs as substitutes for human participants in behavioral research, offering a framework for future AI-augmented methodological designs.
Yang et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: