What question did this study set out to answer?

This research aims to improve ethical decision-making in reinforcement learning by integrating multiple ethical theories.

May 8, 2026Open Access

AMULED: Addressing Moral Uncertainty using Large language models for Ethical Decision-making

RDRohit K. DubeyETH Zurich DDDamian Dailisan SMSachit MahajanETH Zurich

Key Points

This research aims to improve ethical decision-making in reinforcement learning by integrating multiple ethical theories.
Developed a task-agnostic ethical layer using large language models to refine a pre-trained RL agent.
Aggregated belief values from five moral clusters for shaping rewards using Belief Jensen–Shannon Divergence and Dempster–Shafer Theory.
Evaluated the framework across two environments with multiple LLM backbones and conducted 50-run replicates.
In Finding Milk, desirable actions increased by 63.1% and undesirable actions decreased by 60.3%, with only a 5.1% increase in path length.
In Driving and Rescuing, AMULED rescued 38.4% more targets than agents with human feedback, maintaining lower collision rates.
BJSD-DST aggregation outperformed standard methods, handling conflicting moral signals more effectively.

Abstract

Introduction We address moral uncertainty in reinforcement learning (RL) by proposing a framework that integrates multiple ethical theories into decision-making. Existing approaches rely on single moral frameworks or handcrafted rewards, limiting scalability and failing to capture moral pluralism. We introduce AMULED, a task-agnostic ethical layer that refines a pre-trained RL agent using large language models (LLMs) to provide multi-perspective moral feedback. Methods Following initial training, the RL model is fine-tuned using LLM-generated feedback in place of human feedback. Five moral clusters—consequentialist, deontological, virtue, care, and social justice—assign belief values to candidate actions. These beliefs are aggregated using Belief Jensen–Shannon Divergence and Dempster–Shafer Theory to produce probability scores that serve as shaping rewards, while a KL-regularization term constrains deviation from the base policy. The framework is evaluated across two environments (Finding Milk and Driving and Rescuing), multiple LLM backbones, and alternative belief aggregation methods, with 50-run replicates. Results AMULED improves ethical behavior without substantially degrading task performance. In Finding Milk, it increases desirable actions (63.1% more crying babies attended) and reduces undesirable actions (60.3% fewer sleeping babies disturbed), with only a 5.1% increase in path length. In Driving and Rescuing, it balances competing objectives more effectively than baselines, rescuing 38.4% more targets than human-feedback agents while maintaining lower collision rates and reduced policy degradation. Across experiments, BJSD-DST aggregation outperforms standard methods (e.g., voting, averaging) in handling conflicting moral signals and achieves the best overall performance on most metrics. Discussion AMULED operationalizes moral pluralism through scalable, LLM-based feedback and provides a principled mechanism for resolving conflicting ethical signals. The framework demonstrates robustness across tasks and model variants, though performance depends on LLM reasoning quality and can degrade in spatially complex settings. These results suggest that LLM-driven belief aggregation offers a practical alternative to handcrafted rewards and human supervision for ethical decision-making in RL.

Demander à l'IA

Bookmark

View Full Paper