What type of study is this?

This is a Experimental Study study.

September 24, 2025Open Access

Uncalibrated Reasoning: GRPO Induces Overconfidence for Stochastic Outcomes

Key Points

GRPO leads to overconfident probability predictions for binary stochastic outcomes, which complicates accuracy.
In contrast, both PPO and RLOO maintain well-calibrated models for stochastic predictions in experiments.
Removing group standard normalization from GRPO corrects the miscalibration observed in its predictions.
The findings provide new insights against standard normalization in GRPO, urging caution in applying RL methods.

Abstract

Reinforcement learning (RL) has proven remarkably effective at improving the accuracy of language models in verifiable and deterministic domains like mathematics. Here, we examine if current RL methods are also effective at optimizing language models in verifiable domains with stochastic outcomes, like scientific experiments. Through applications to synthetic data and real-world biological experiments, we demonstrate that Group Relative Policy Optimization (GRPO) induces overconfident probability predictions for binary stochastic outcomes, while Proximal Policy Optimization (PPO) and REINFORCE Leave-One-Out (RLOO) yield well-calibrated models. We show that removing group standard normalization in GRPO fixes its miscalibration and provide a theoretical explanation for why normalization causes overconfidence. Our results provide new evidence against the use of standard normalization in GRPO and help pave the way for applications of RL for reasoning language models beyond deterministic domains.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Michael Bereket

Jure Leskovec

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Uncalibrated Reasoning: GRPO Induces Overconfidence for Stochastic Outcomes

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study