Generalizing Reward Modeling for Out-of-Distribution Preference Learning | Synapse