Abstract - The rapid development of agentic AI, where autonomous agents act instead of humans need to align with human values to attain their objectives, despite initially having limited knowledge of those values.Using traditional reinforcement learning in agentic AI may result in incorrectly formulated reward functions resulting in unsafe behaviours. Furthermore RL agents encounter difficulties in sampling, reward misalignment, and lack of cross-domain generalization. In this paper, we introduce Bayesian Cooperative Inverse Reinforcement Learning (Bayesian CIRL). This is a new method that uses a latent variable with a probabilistic belief about the distribution of the human reward function. Unlike the traditional IRL, Bayesian CIRL treats value alignment as a cooperative game, where the agent updates its beliefs by observing human actions and also plans its own actions to maximize the uncertain joint reward. This approach allows adaptive cooperation through active learning. It also transfer the learned behaviour to reduce uncertainty over human inclinations. Experimental tests show that Bayesian CIRL offers more robust and accurate value alignment than standard IRL algorithms. It handles ambiguity better and allows for reliable interaction between humans and agents. This framework provides a clear way to introduce agentic AI systems that align with human ethics and societal expectations. Key Words: Agentic AI, Reinforcement Learning, Inverse Reinforcement Learning, Cooperative Inverse Reinforcement Learning, Bayesian CIRL
Gayathri et al. (Wed,) studied this question.