What question did this study set out to answer?

To evaluate the effectiveness of a Large Language Model (LLM) in extracting data for breast cancer MDTM case preparation.

May 30, 2026

Intelligence-driven data extraction for the breast cancer multidisciplinary team meeting case preparation: Results from a prospective external validation.

Key Points

To evaluate the effectiveness of a Large Language Model (LLM) in extracting data for breast cancer MDTM case preparation.
Prospective external validation of an AI-driven data extraction tool for MDTM that utilized a dataset of radiology reports from confirmed breast cancer patients provided by Barts Health NHS Trust from 2018 to 2024.
Reports analyzed using a fine-tuned, domain-specific LLM focusing on 21 extraction features classified into three tiers based on data type.
Mean Absolute Error for continuous numeric features (T1a) ranged from 96% to 99%, while discrete ordinal features (T1b) ranged from 74% to 93%.
F1 scores for multi-class categorical features (T2) ranged from 0.6 to 0.9, indicating effective model performance across all classes.
Exact match accuracy of the LLM for free text parameters (T3) ranged from 62% to 93%, demonstrating robust extraction capabilities.

Abstract

e13669 Background: Cancer Multi-Disciplinary Team Meetings (MDTMs) are central to UK Cancer Pathways irrespective of patient case complexities. A major bottleneck for the MDT is time-consuming, laborious manual review of clinical summaries and investigation reports to prepare MDTM cases. Large Language Models (LLMs) can extract critical, structured information from this vast, complex unstructured text reservoir. After a multiple LLM benchmark testing model in an internal sandbox environment as part of an Oncology Intelligence Platform, the most optimal LLM (maximal accuracy, minimal hallucinations and Graphics Processing Unit usage) was deployed for a Data Extraction task to prepare cases for the Breast Cancer MDTM. This was a prospective, external validation experience to highlight LLM performance. Methods: A retrospective dataset consisting of structured and unstructured radiology investigation text reports of confirmed breast cancer patients from the Barts Health NHS Trust Data Platform was obtained from 2018 to 2024. These reports were multisource including regional (Mammogram, Ultrasound and MRI Breast) and systemic scans (Staging/Response Assessment CT Chest Abdomen Pelvis and Bone Scans). An Artificial Intelligence (AI) powered Cancer MDTM CoPilot software platform (OncoflowTM) was used on this data to perform strategic extraction to a set of defined objective parameters, including clinical TNM (tumour-node-metastasis) classification points. Results: 165 aforementioned reports of varying disease stages (I to IV) were prospectively processed by OncoFlow’s fine tuned, cancer data extraction task specific LLM. This LLM was an open source, domain-specific, multilingual, instruction-tuned (having undergone distillation and reinforcement learning), autoregressive transformer model. There were 21 extraction features. These were divided into 3 Tiers based on data types - T1a (continuous numeric), T1b (discrete ordinal), T2 (categorical with intrinsic order), T3 (free text) comprising 2, 4, 3, 12 parameters respectively. Performance metrics for T1 features used Mean Absolute Error (MAE), which ranged from 96 to 99% for T1a and 74 to 93% for T1b. T2 being multi-class, used F1 scores, i.e., Micro-F1 (model performance on whole dataset/all classes) ranging 0.8 - 0.9 and Macro-F1 (average model performance across each class) ranging 0.6 - 0.8. Token-level F1 score, measuring precision and recall, was used in model performance for T3 parameters. This ranged from 78 to 95%. Exact match accuracy for the aforesaid was 62 to 93%. Conclusions: The LLM achieved robust, clinically relevant accuracy scores across all data tiers. The reliable scores showcase the model’s readiness to streamline and standardise MDTM case preparations.

Bookmark

Intelligence-driven data extraction for the breast cancer multidisciplinary team meeting case preparation: Results from a prospective external validation.

Key Points

Abstract

Cite This Study

Also Consider

Also Consider