What type of study is this?

This is a Experimental Study study.

September 24, 2025Open Access

MedGR²: Breaking the Data Barrier for Medical Reasoning via Generative Reward Learning

Key Points

MedGR$^2$ leads to better generalization in medical reasoning tasks using generative reward learning.
Experiments show that supervised fine-tuning with MedGR$^2$ data surpasses baselines from human-curated datasets.
By leveraging this framework for reinforcement learning, the model achieves state-of-the-art cross-modality generalization.
This approach transforms data scarcity into data generation, unlocking potential for robust medical AI systems.

Abstract

The application of Vision-Language Models (VLMs) in medicine is critically hampered by the scarcity of high-quality, expert-annotated data. Supervised Fine-Tuning (SFT) on existing datasets often leads to poor generalization on unseen modalities and tasks, while Reinforcement Learning (RL), a promising alternative, is stymied by the lack of reliable reward signals in this data-scarce domain. To break this impasse, we introduce Generative Reward Learning for Medical Reasoning (MedGR²), a novel framework that creates a self-improving virtuous cycle. MedGR² co-develops a data generator and a reward model, enabling the automated, continuous creation of high-quality, multi-modal medical data that serves as both a superior training source for SFT and RL. Our experiments demonstrate that SFT with MedGR²-produced data already surpasses baselines trained on large-scale, human-curated datasets. Crucially, when leveraging this data for RL via Group Relative Policy Optimization (GRPO), our model achieves state-of-the-art cross-modality and cross-task generalization, significantly outperforming specialized RL-based methods. Furthermore, our compact model, empowered by MedGR², achieves performance competitive with foundation models possessing over 10 times more parameters. MedGR² presents a new paradigm for data-efficient learning in high-stakes domains, transforming the problem from data scarcity to data generation and unlocking the full potential of RL for building truly generalizable medical AI.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper

Cite This Study

Zhi et al. (Thu,) studied this question.

synapsesocial.com/papers/68d6e0fc8b2b6861e4c3f45b https://doi.org/https://doi.org/10.48550/arxiv.2508.20549

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

KI fragen

Bookmark

View Full Paper