Reliable and efficient post-training of large reasoning models : reinforcement learning dynamics and adaptation | Synapse