What question did this study set out to answer?

The goal is to enhance the quality of automated histopathology report generation by addressing existing limitations in multi-scale visual context and semantic alignment.

March 28, 2026Open Access

Hierarchical context modeling and prototype-mediated cross-modal alignment for histopathology report generation

Key Points

The goal is to enhance the quality of automated histopathology report generation by addressing existing limitations in multi-scale visual context and semantic alignment.
Proposed HC-Gen framework combining hierarchical context modeling with prototype-mediated cross-modal alignment.
Developed a hierarchical context fusion module to integrate multi-scale visual-semantic context.
Implemented a cross-modal prototypical memory module for effective vision-language alignment.
Evaluated model performance using natural language generation metrics and human assessments.
HC-Gen significantly outperformed existing methods based on benchmark datasets.
Enhanced report generation quality with better alignment between visual and language modalities.
The framework provided crucial interpretability support for its decision-making process.

Abstract

Abstract Histopathology in whole slide images (WSIs) serves as the gold standard for cancer diagnosis, with clinical reports playing a critical role in decision-making. However, the time-consuming nature of conventional pathological examination has driven increasing and urgent demand for automated report generation. Deep learning methods offer a certain potential to revolutionize this requirement by Histopathology Report Generation (HRG). Nevertheless, existing HRG approaches suffer from low-quality generation results due to ineffective exploration of multi-scale visual context in gigapixel WSIs and the inherent semantic gap between heterogeneous vision-language modalities. To address these challenges, we propose HC-Gen, a novel framework which synergistically combines hierarchical context modeling with prototype-mediate cross-modal alignment for HRG. Inspired by pathologists’ anatomically-grounded diagnostic logic, we design a hierarchical context fusion module to integrate multi-scale visual-semantic context and implicit hierarchy prior in WSIs. Furthermore, we propose a cross-modal prototypical memory module to establish learnable semantic prototypes as intermediate bridges to achieve unified and efficient vision-language alignment. Model performance was assessed through natural language generation metrics and human evaluation, extensive experiments on two benchmark datasets demonstrate that HC-Gen outperforms state-of-the-art methods. Extra visualization provides crucial support for the interpretability of the decision process. Our code is available at: https://github.com/Modaoshuangming/HC-Gen .

Bookmark

View Full Paper

Bookmark

View Full Paper

Hierarchical context modeling and prototype-mediated cross-modal alignment for histopathology report generation

Key Points

Abstract

Cite This Study