What question did this study set out to answer?

To assess if large language models can extract prognostic information from pathology reports for risk stratification in oncology.

May 16, 2026Open Access

SCRIPT: Stratified clinical risk prediction from pathology reports using large language models

Key Points

To assess if large language models can extract prognostic information from pathology reports for risk stratification in oncology.
Used open-weight LLaMA 3.3 70B model to generate risk scores from pathology reports.
Evaluated associations between LLM-generated scores and survival outcomes in gastrointestinal cancers.
Conducted multivariate analysis to confirm LLM-generated risk score as an independent prognostic factor.
In colorectal cancer, LLM-generated risk scores showed significant prognostic value for overall survival (HR = 2.77, 95% CI = 1.92–3.97, p < 0.001).
Progression-free survival was significantly predicted by LLM-generated scores (HR = 2.93, 95% CI = 2.11–4.08, p < 0.001).
Disease-specific survival demonstrated strong prognostic value with LLM scores (HR = 5.85, 95% CI = 3.66–9.36, p < 0.001).

Abstract

Accurate risk stratification in oncology is essential for guiding treatment decisions, yet current algorithms rely on a narrow set of structured variables, and hence potentially ignore the rich signal in narrative pathology reports. These reports contain nuanced morphological descriptions and expert clinical judgment, yet this narrative information remains largely unused in clinical decision-making as it gets lost in “prose” text-based reports. We hypothesized that large language models (LLMs) could extract prognostic information from complete free-text pathology reports and convert it into a binary survival biomarker. We used the open-weight LLaMA 3.3 70B model to generate risk scores directly from publicly available pathology reports across three gastrointestinal cancer types. The model was prompted to synthesize the complete narrative reports into a binary prognostic score. We evaluated associations between the LLM-generated scores and survival outcomes, including overall survival, progression-free survival, and disease-specific survival. In colorectal cancer, LLM-generated risk scores demonstrated significant prognostic value for overall survival (Hazard ratio (HR) = 2.77, 95% confidence interval (CI) = 1.92–3.97, p < 0.001), progression-free survival (HR = 2.93, 95% CI = 2.11–4.08, p < 0.001), and disease-specific survival (HR = 5.85, 95% CI = 3.66–9.36, p < 0.001). Multivariate analysis confirmed the LLM-generated risk score as an independent prognostic factor for progression-free survival. LLMs can turn narrative pathology reports into a single, independent survival biomarker. This approach leverages routinely available free-text documentation without requiring additional tissue analysis or pathologist workload, providing a deployable method to enhance risk stratification for treatment decision-making.

SCRIPT: Stratified clinical risk prediction from pathology reports using large language models

Key Points

Abstract

Cite This Study