What type of study is this?

This is a Quantitative Study study.

October 17, 2025Open Access

Mapping Moral Reasoning in LLMs: A Multi-Dimensional Analysis of Safety Principle Conflicts

Key Points

Significant differences in ethical reasoning style were observed, highlighting conflicts in moral priorities.
The study utilized eleven-dimensional normative profiles to evaluate LLM responses to ethical dilemmas.
Model outputs were quantitatively measured, capturing the complexity and safety emphasis across different generations.
Robustness of clusters in reasoning were confirmed, supporting the value of semantic alignment in AI evaluations.

Abstract

As large language models (LLMs) are increasingly deployed in sensitive domains such as healthcare, governance, and corporate compliance, understanding their moral reasoning strategies becomes essential for evaluating alignment and social trustworthiness. This paper presents a structured analysis of how open-weight LLMs resolve conflicts between competing safety principles—including public welfare, institutional transparency, and individual rights—using carefully designed ethical dilemmas. Each model response is encoded into an eleven-dimensional normative profile, derived from both semantic similarity to canonical ethical theories and dictionary-based moral cues. A set of quantitative metrics, including entropy, top-alignment ratio, and reasoning density, captures variation in ethical framing, complexity, and safety prioritization across 90 model generations. Statistically significant differences emerge in reasoning style and moral salience (p < 0.01), while PCA and clustering reveal three recurring behavioral patterns: rule-based, balanced, and pragmatic integration. An ablation study confirms that these clusters persist without dictionary features (Adjusted Rand Index = 0.475), supporting the robustness of semantic alignment. This work contributes a replicable methodology for the moral profiling of LLMs, offering empirical tools for diagnosing value conflicts and informing future efforts in AI transparency, contestability, and pluralistic alignment. The findings underscore the need for interpretable metrics and diverse normative baselines in the evaluation of automated decision systems.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper