Learn to Reason Efficiently with Adaptive Length-based Reward Shaping | Synapse