ESRL: Efficient Sampling-Based Reinforcement Learning for Sequence Generation | Synapse