Combining policy gradient and Q-learning | Synapse