Learning Optimal Advantage from Preferences and Mistaking It for Reward | Synapse