Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model | Synapse