Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data | Synapse