Maximum Entropy Softmax Policy Gradient via Entropy Advantage Estimation | Synapse