What question did this study set out to answer?

The study aims to assess how closely AI-generated recommendations align with those from multidisciplinary tumor boards in gynecologic oncology.

February 2, 2026Open Access

Discrepancies Between MDT Recommendations and AI-Generated Decisions in Gynecologic Oncology: A Retrospective Comparative Cohort Study

Key Points

The study aims to assess how closely AI-generated recommendations align with those from multidisciplinary tumor boards in gynecologic oncology.
Conducted a single-center retrospective analysis of 599 patients with various gynecologic cancers.
Used standardized anonymized case summaries for evaluation by ChatGPT 5.0 following ESGO guidelines.
Compared AI-generated staging and treatment recommendations against MDT decisions.
Discrepancies assessed independently by two reviewers, categorized by cancer type and treatment domain.
Overall concordance for FIGO staging was 77.0%.
Chemotherapy and targeted therapy decisions showed lower discordance at 8.2% and 6.8%, respectively.
Highest staging disagreement occurred in early-stage endometrial cancer at 32.6%.
Recurrent ovarian and cervical cancer cases had significant discrepancies in surgical recommendations.
Vulvar cancer cases demonstrated the highest overall agreement with MDT recommendations.

Abstract

Background: Multidisciplinary tumor boards (MDTs) remain the foundation of gynecologic cancer management, yet increasing diagnostic complexity and rapidly evolving molecular classifications have intensified interest in artificial intelligence (AI) as a potential decision-support tool. This study aimed to evaluate the concordance between MDT-derived recommendations and those generated by ChatGPT 5.0 across a large, real-world cohort of gynecologic oncology cases. Methods: This single-center retrospective analysis included 599 consecutive patients with cervical, endometrial, ovarian, or vulvar cancer evaluated during MDT meetings over a 2-month period. Standardized anonymized case summaries were entered into ChatGPT 5.0, which was instructed to follow current ESGO guidelines. AI-generated staging and treatment recommendations were compared with MDT decisions. Discrepancies were independently assessed by two reviewers and stratified by malignancy type, disease stage, and treatment domain. Results: Overall concordance for FIGO staging was 77.0%, while treatment-related decisions demonstrated lower discordance, particularly in chemotherapy (8.2%) and targeted therapy (6.8%). The highest staging disagreement occurred in early-stage endometrial cancer (32.6%), reflecting the complexity of newly revised molecular classifications. In recurrent ovarian and cervical cancer, discrepancies were more pronounced in surgical and systemic therapy recommendations, suggesting limited AI capacity to integrate multimodal imaging, prior treatments, and individualized considerations. Vulvar cancer cases showed the highest overall agreement. Conclusions: ChatGPT 5.0 aligns with MDT decisions in many straightforward scenarios but falls short in complex or nuanced cases requiring contextual, multimodal, and patient-specific reasoning. These findings underscore the need for prospective, real-time evaluation, multimodal data integration, external validation, and explainable AI frameworks before LLMs can be safely incorporated into routine gynecologic oncology decision-making.

Discrepancies Between MDT Recommendations and AI-Generated Decisions in Gynecologic Oncology: A Retrospective Comparative Cohort Study

Key Points

Abstract

Cite This Study

Also Consider

Also Consider