Tell my why: Training preferences-based RL with human preferences and step-level explanations | Synapse