Offline Regularised Reinforcement Learning for Large Language Models Alignment | Synapse