This study investigates the moral reasoning capabilities of large language models (LLMs), focusing on biases and the extent to which outputs reflect training data patterns rather than genuine reasoning. Using the Moral Competence Test (MCT) and the Moral Foundations Questionnaire (MFQ), we compared responses from human participants and LLM-based chatbots like ChatGPT. MCT results show that humans consistently outperform LLMs, indicating higher moral competence. MFQ responses from LLMs emphasize harm/care and fairness/reciprocity, but under-represent loyalty, authority, and purity. This pattern suggests a data-proportionality effect, where moral emphasis mirrors the prevalence of certain values in training data. Additionally, fine-tuning methods such as reinforcement learning with human feedback may amplify specific moral norms. These imbalances could unintentionally shape users' moral intuitions and societal norms when LLMs are widely deployed. Our findings underscore the need for continuous auditing and alignment to ensure that LLMs provide ethically balanced and socially responsible guidance in morally sensitive applications.
Bajpai et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: