Uncalibrated Reasoning: GRPO Induces Overconfidence for Stochastic Outcomes | Synapse