What question did this study set out to answer?

The aim is to explore how reinforcement learning and intermediate reasoning can improve CXR interpretation in vision-language models.

April 21, 2026Open Access

RadVLM-GRPO : enhancing chest X-ray report generation and visual grounding via reinforcement learning

Key Points

The aim is to explore how reinforcement learning and intermediate reasoning can improve CXR interpretation in vision-language models.
Conducted large-scale supervised fine-tuning on chest X-ray data to develop a new RadVLM model.
Implemented Group Relative Policy Optimization with task-specific rewards for enhanced report generation and visual grounding.
Executed matched experiments on variations of the model with and without reinforcement learning and thinking.
Reinforcement learning yielded significant improvements in both report generation and visual grounding tasks.
RL-optimized RadVLM models surpassed baseline models, achieving state-of-the-art performance.
Explicit thinking did not enhance outcomes beyond the gains from strong supervised fine-tuning and reinforcement learning.

Abstract

Recent advances in vision-language models (VLMs) have improved Chest X-ray (CXR) interpretation in multiple aspects. However, many medical VLMs rely solely on supervised fine-tuning (SFT), which optimizes next-token prediction without evaluating answer quality. In contrast, reinforcement learning (RL) can incorporate task-specific feedback, and its combination with explicit intermediate reasoning (``thinking'') has demonstrated substantial gains on verifiable math and coding tasks. To investigate the effects of RL and thinking in a CXR VLM, we perform large-scale SFT on CXR data to build an updated RadVLM based on Qwen3-VL, followed by a cold-start SFT stage that equips the model with basic thinking ability. We then apply Group Relative Policy Optimization (GRPO) with clinically grounded, task-specific rewards for report generation and visual grounding, and run matched RL experiments on both domain-specific and general-domain Qwen3-VL variants, with and without thinking. Across these settings, we find that while strong SFT remains crucial for high base performance, RL provides additional gains on both tasks, whereas explicit thinking does not appear to further improve results. Under a unified evaluation pipeline, the RL-optimized RadVLM models outperform their baseline counterparts and reach state-of-the-art performance on both report generation and grounding, highlighting clinically aligned RL as a powerful complement to SFT for medical VLMs.

KI fragen

Bookmark

View Full Paper

Cite This Study

Gundersen et al. (Sat,) studied this question.

synapsesocial.com/papers/69e7138bcb99343efc98d017 https://doi.org/https://doi.org/10.21256/zhaw-36426

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

KI fragen

Bookmark

View Full Paper