Mirror Descent Safe Policy Optimization for Reinforcement Learning Agents | Synapse