Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards | Synapse