Introduction We address moral uncertainty in reinforcement learning (RL) by proposing a framework that integrates multiple ethical theories into decision-making. Existing approaches rely on single moral frameworks or handcrafted rewards, limiting scalability and failing to capture moral pluralism. We introduce AMULED, a task-agnostic ethical layer that refines a pre-trained RL agent using large language models (LLMs) to provide multi-perspective moral feedback. Methods Following initial training, the RL model is fine-tuned using LLM-generated feedback in place of human feedback. Five moral clusters—consequentialist, deontological, virtue, care, and social justice—assign belief values to candidate actions. These beliefs are aggregated using Belief Jensen–Shannon Divergence and Dempster–Shafer Theory to produce probability scores that serve as shaping rewards, while a KL-regularization term constrains deviation from the base policy. The framework is evaluated across two environments (Finding Milk and Driving and Rescuing), multiple LLM backbones, and alternative belief aggregation methods, with 50-run replicates. Results AMULED improves ethical behavior without substantially degrading task performance. In Finding Milk, it increases desirable actions (63.1% more crying babies attended) and reduces undesirable actions (60.3% fewer sleeping babies disturbed), with only a 5.1% increase in path length. In Driving and Rescuing, it balances competing objectives more effectively than baselines, rescuing 38.4% more targets than human-feedback agents while maintaining lower collision rates and reduced policy degradation. Across experiments, BJSD-DST aggregation outperforms standard methods (e.g., voting, averaging) in handling conflicting moral signals and achieves the best overall performance on most metrics. Discussion AMULED operationalizes moral pluralism through scalable, LLM-based feedback and provides a principled mechanism for resolving conflicting ethical signals. The framework demonstrates robustness across tasks and model variants, though performance depends on LLM reasoning quality and can degrade in spatially complex settings. These results suggest that LLM-driven belief aggregation offers a practical alternative to handcrafted rewards and human supervision for ethical decision-making in RL.
Dubey et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: