Fast Stochastic Policy Gradient: Negative Momentum for Reinforcement Learning | Synapse