July 19, 2025Open Access

Large language model integrations in cancer decision-making: a systematic review and meta-analysis

Key Points

Large language models are being integrated into oncology decision-making to support patients and clinicians.
The meta-analysis included 56 studies across 15 cancer types, showing varied performance and an overall accuracy of 76.2%.
Most studies focused on quantitative evaluations, emphasizing accuracy while neglecting safety and clarity evaluations.
Current limitations of large language models in cancer care highlight the need for better datasets and standardized assessments.

Abstract

Abstract Large Language Models (LLMs) are increasingly used to support cancer patients and clinicians in decision-making. This systematic review investigates how LLMs are integrated into oncology and evaluated by researchers. We conducted a comprehensive search across PubMed, Web of Science, Scopus, and the ACM Digital Library through May 2024, identifying 56 studies covering 15 cancer types. The meta-analysis results suggested that LLMs were commonly used to summarize, translate, and communicate clinical information, but performance varied: the average overall accuracy was 76.2%, with average diagnostic accuracy lower at 67.4%, revealing gaps in the clinical readiness of this technology. Most evaluations relied heavily on quantitative datasets and automated methods without human graders, emphasizing “accuracy” and “appropriateness” while rarely addressing “safety”, “harm”, or “clarity”. Current limitations for LLMs in cancer decision-making, such as limited domain knowledge and dependence on human oversight, demonstrate the need for open datasets and standardized evaluations to improve reliability.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper

Cite This Study

Hao et al. (Thu,) studied this question.

synapsesocial.com/papers/689a02c9e6551bb0af8cceb4 https://doi.org/https://doi.org/10.1038/s41746-025-01824-7

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

KI fragen

Bookmark

View Full Paper