What type of study is this?

September 10, 2025Open Access

Bayesian CIRL: A Unified Framework for Adaptive and Trustworthy Human-Agent Collaboration

Key Points

Bayesian CIRL offers more robust and accurate value alignment than standard IRL algorithms, improving safety.
It uses probabilistic beliefs about human reward functions, allowing agents to adapt through observing human behavior.
This approach handles ambiguity effectively, facilitating reliable interactions between humans and agents.
The method treats value alignment as a cooperative game, maximizing uncertain joint rewards through adaptive cooperation.

Abstract

Abstract - The rapid development of agentic AI, where autonomous agents act instead of humans need to align with human values to attain their objectives, despite initially having limited knowledge of those values.Using traditional reinforcement learning in agentic AI may result in incorrectly formulated reward functions resulting in unsafe behaviours. Furthermore RL agents encounter difficulties in sampling, reward misalignment, and lack of cross-domain generalization. In this paper, we introduce Bayesian Cooperative Inverse Reinforcement Learning (Bayesian CIRL). This is a new method that uses a latent variable with a probabilistic belief about the distribution of the human reward function. Unlike the traditional IRL, Bayesian CIRL treats value alignment as a cooperative game, where the agent updates its beliefs by observing human actions and also plans its own actions to maximize the uncertain joint reward. This approach allows adaptive cooperation through active learning. It also transfer the learned behaviour to reduce uncertainty over human inclinations. Experimental tests show that Bayesian CIRL offers more robust and accurate value alignment than standard IRL algorithms. It handles ambiguity better and allows for reliable interaction between humans and agents. This framework provides a clear way to introduce agentic AI systems that align with human ethics and societal expectations. Key Words: Agentic AI, Reinforcement Learning, Inverse Reinforcement Learning, Cooperative Inverse Reinforcement Learning, Bayesian CIRL

Bayesian CIRL: A Unified Framework for Adaptive and Trustworthy Human-Agent Collaboration

Key Points

Abstract

Cite This Study