Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm | Synapse