Learning a Dense Reasoning Reward Model from Expert Demonstration via Inverse Reinforcement Learning | Synapse