What question did this study set out to answer?

This research aims to tackle the challenges of tracking disease progression in chest X-rays by modeling image differences through a multi-task self-supervised framework.

March 2, 2026Open Access

MRID: Modeling Radiological Image Differences for Disease Progression Reasoning via Multi-Task Self-Supervision

Key Points

This research aims to tackle the challenges of tracking disease progression in chest X-rays by modeling image differences through a multi-task self-supervised framework.
Proposed MRID framework following a pretraining-finetuning paradigm
Implemented intra-modal spatial alignment for organs and pathological regions
Achieved cross-modal semantic alignment between visual differences and report embeddings
Introduced data augmentation techniques to address disease progression category imbalances
Conducted experiments on Longitudinal-MIMIC and MS-CXR-T datasets
MRID effectively captures detailed disease progression patterns in chest X-rays
Shows competitive performance in single-image radiology report generation
Demonstrates enhanced alignment of visual and textual data representations

Abstract

Automated radiology report generation has become a prominent research topic in medical multimodal learning. However, most existing approaches primarily focus on single-image interpretation and rarely address the task of tracking disease progression across longitudinal chest X-rays. This task presents two major challenges: accurately localizing pathological changes between temporally paired images, and effectively translating visual difference representations into clinically meaningful textual descriptions. To address these challenges, we propose MRID (Modeling Radiological Image Differences for Disease Progression Reasoning), a multi-task self-supervised framework that follows a pretraining–finetuning paradigm. MRID leverages multiple complementary self-supervised objectives to jointly achieve (1) intra-modal spatial alignment of organs and pathological regions across image pairs, and (2) cross-modal semantic alignment between visual difference representations and radiology report embeddings. Furthermore, we introduce a simple yet effective data augmentation strategy to alleviate the imbalance of disease progression categories. Extensive experiments conducted on the Longitudinal-MIMIC and MS-CXR-T datasets demonstrate that MRID effectively captures fine-grained disease progression patterns. In addition, the proposed framework achieves competitive performance on single-image radiology report generation, further highlighting its strong capability in modeling chest X-ray semantics.

MRID: Modeling Radiological Image Differences for Disease Progression Reasoning via Multi-Task Self-Supervision

Key Points

Abstract

Cite This Study