Robust Preference Optimization through Reward Model Distillation | Synapse