REBEL: Reinforcement Learning via Regressing Relative Rewards | Synapse