Bayesian Reward Models for LLM Alignment | Synapse