What does this research mean for the field?

Leading artificial intelligence models generate oncology treatment recommendations that highly align with those of multidisciplinary expert panels, supporting their potential as clinical decision support tools. Novelty: ClaimNovelty.INCREMENTAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This study aims to evaluate the alignment of AI-generated recommendations with those from a multidisciplinary review (MDR) panel for complex cancer cases.

May 30, 2026

Using artificial intelligence (AI) as a decision support tool in clinic.

Key Points

This study aims to evaluate the alignment of AI-generated recommendations with those from a multidisciplinary review (MDR) panel for complex cancer cases.
Reviewed 261 complex cancer cases previously adjudicated by an MDR panel from 2020 to 2021.
Utilized three AI models (ChatGPT 4.5, Claude Opus 4, Gemini Ultra) to analyze cases using a proprietary prompting method.
Scored recommendations across six domains: completeness, reasoning, clarity, menu of options, recency, and relevance.
AI systems showed high alignment with MDR recommendations, especially in recency but not in completeness.
Minor differences in options from AI were unlikely to lead to clinically meaningful changes in management.
Overall aggregate scores ranged from 20.5 to 25.5 out of 30 for different cancer types, demonstrating strong performance across all models.

Abstract

e13647 Background: Multidisciplinary reviews (MDR) can alter the management of cancer cases. We previously presented more than 400 real-world cases across 5 cancer types to an expert MDR panel comprising radiology, medical, surgical, radiation, and hematologic oncology. Here, we evaluate the alignment and competence of recommendations generated by 3 leading AI models relative to those made by an MDR panel. Methods: We reviewed 261 complex cases in breast, lung, heme, gastrointestinal (GI), and genitourinary (GU) cancers previously adjudicated by an MDR panel between 2020 and 2021 from a larger cancer database. Cases were analyzed by AI models (OpenAI’s ChatGPT 4.5, Anthropic’s Claude Opus 4 and Google’s Gemini Ultra) using PrecisCa’s proprietary prompting method. Individual AI-generated recommendations from each model were scored on a scale of 1-5 (5 highest) across 6 domains: completeness, reasoning, clarity, menu of options, recency, and relevance versus the MDR panel recommendations. The maximum achievable score was 30 per case, yielding a total achievable aggregate score of 7,830. Final AI recommendations were also compared to National Comprehensive Cancer Network (NCCN) guidelines for discrepancies. Reverse comparisons of additional AI-recommended options not identified by the MDR panel were not performed due to interval updates in the past 5 years. Results: Across the board (Table 1), AI systems excelled in recency but not in completeness. While variability existed among the 3 AI models, alignment with MDR expert recommendations was high. Discordant cases reflected minor differences in option selection and were unlikely to have resulted in clinically meaningful changes in management. Conclusions: This study demonstrates a high degree of alignment between recommendations generated by 3 leading AI models and those of a MDR panel across multiple complex cancer cases. These findings support the potential role of AI as a clinical decision support tool when used in conjunction with human experts’ review, rather than as a replacement for multidisciplinary care. Characteristics and aggregate/median competence score (range) by cancer type. Cancer Type n Histology (%) ChatGPT 4.5 Claude Opus 4 Gemini Ultra Breast 70 Ductal 90; Lobular 10 1868/25.5 (21-30) 1940/23.5 (17-30) 1965/25.5 (21-30) Lung 70 Non-small cell 92.9; Small cell 7.1 1860/25 (20-30) 1942/25 (20-30) 1971/26 (22-30) Heme 38 Hodgkin lymphoma 13.2; Leukemia 10.5; Multiple myeloma 36.8; Non-Hodgkin lymphoma 39.5 849/22.5 (15-30) 932/22.5 (15-30) 964/22.5 (15-30) GI 48 Anal 6.25; Colorectal 43.8; Esophageal 12.5; Gastric 6.25; Hepatobiliary 10.4; Pancreatic 20.8 1231/20.5 (11-30) 1249/20.5 (11-30) 1264/21.5 (13-30) GU 35 Bladder 20; Kidney 31.4; Prostate 42.9; Testicular 5.7 880/23.5 (17-30) 931/23.5 (17-30) 889/24 (18-30) Total 261 6688/20.5 (11-30) 6994/20.5 (11-30) 7053/21.5 (13-30)

Bookmark

Using artificial intelligence (AI) as a decision support tool in clinic.

Key Points

Abstract

Cite This Study

Also Consider

Also Consider