Supervised Fine-Tuning as Inverse Reinforcement Learning | Synapse