e13673 Background: Multi-Disciplinary Team Meetings (MDTMs) are “gold standard” in the UK Cancer Care continuum. Multiple, fragmented systems complicate MDTM workflows due to lack of data integration and system coordination. A proof-of-concept exploratory, comparative analysis of a multiple Large Language Model (LLM) benchmarked Oncology Intelligence Platform was conducted prospectively for treatment decision making support in the Breast Cancer MDTM. The aim was to identify the most sustainable and scalable, real-world LLM, i.e., with maximal accuracy, minimal hallucination rate, and manageable GPU (Graphics Processing Unit) usage. Methods: A retrospective validation dataset of 225 matched National Health Service England (NHSE) radiology and histopathology text reports from randomised, heterogenous, confirmed and/or suspicious breast cancer patients with a wide range of disease stage (I to IV/early to advanced) and objective disease characteristic findings was curated from 2018 to 2024. This was a mixture of primarily unstructured, structured, unimodal, multi-source data. 125/225 were radiology consisting of Ultrasound, Mammogram, MRI Breast, CT and Bone Scans whereas the rest 100 were histopathology and/or supplementary reports. OncoflowTM, an Artificial Intelligence (AI) powered Cancer MDTM Coordinator CoPilot tool, was implemented. This software platform deployed multiple LLMs synchronously for treatment matching, i.e., mapping the objective output of data extracted from reports to recommendations from clinician preferred knowledge sources.These involved clinical practice guidelines namely European Society of Medical Oncology (ESMO), National Institute for Health and Care Excellence (NICE), Breast Systemic Anti-Cancer Therapy (SACT) Protocols from the Clatterbridge NHS Cancer Centre. Results: 6 LLMs were benchmark tested in a sandbox environment. LLM 1 and 4 were Base/Foundational models, Domain Specific (Medical) and Proprietary whereas the rest were Instruction tuned, Open Source with general purpose and multilinguality. LLM 4 was Autoencoding (Encoder only) whereas the rest were Autoregressive (Decoder only). All these LLMs were fine-tuned and a proprietary data processing strategy was used to enhance the models towards task suitability. LLM 2 showed Guideline (ESMO/NICE) Accuracy rate of 92%, SACT Protocol Accuracy Rate of 100%, hallucination rate of 3.4% and GPU usage of 18GB, outperforming others as the most preferred. Conclusions: The rising cancer incidence and rapidly evolving evidence-based treatment decision making needs have resulted in unsurmountable MDTM pressures. The 10 year health plan envisions to “make the NHS the most AI-enabled health system” “with AI seamlessly integrated into clinical pathways”. This will in turn invite incorporation into real world clinical workflows via electronic health record integration.
Benny et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: