What type of study is this?

This is a Experimental Study study.

October 20, 2025Open Access

Diversity-Incentivized Exploration for Versatile Reasoning

Key Points

DIVER framework improves reasoning by leveraging global diversity incentives for exploration in LLMs.
Empirical research indicates a strong correlation between global diversity and improved reasoning capacity.
The potential-based reward shaping mechanism enhances policy invariance while mitigating reward hacking risks.
DIVER demonstrates superior performance compared to existing reinforcement learning baselines across various task evaluations.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a crucial paradigm for incentivizing reasoning capabilities in Large Language Models (LLMs). Due to vast state-action spaces and reward sparsity in reasoning tasks, existing methods often struggle with deficient exploration and poor sample efficiency. In the paper, we propose DIVER (Diversity-Incentivized Exploration for VersatilE Reasoning), an innovative framework that highlights the pivotal role of global sequence-level diversity to incentivize deep exploration for versatile reasoning. We first conduct a primary empirical study to reveal a strong positive correlation between global diversity and reasoning capacity. Building on this insight, we introduce global diversity incentives as an intrinsic reward to promote deep exploration in a semantically structured space. Incorporating the intrinsic reward, we develop a potential-based reward shaping mechanism to preserve optimal policy invariance and design simple heuristics to mitigate possible reward hacking. Experimental results show that DIVER outperforms competitive RLVR baselines with various exploration strategies on both in-domain and out-of-domain tasks, excelling in both Pass@1 and Pass@k evaluations. Our code is available at https: //github. com/NJU-RL/DIVER.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Z.-W. Hu

Shilin Zhang

Beijing Chemical Industry Research Institute (China)

Yingwu LI

Kunming University

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Diversity-Incentivized Exploration for Versatile Reasoning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider