Mathematical guarantees for trust region policy optimization | Synapse