What question did this study set out to answer?

The aim is to assess whether large language models can extract specific disease involvement details for ESGO reports in advanced ovarian cancer.

March 3, 2026Open Access

Automated Extraction of ESGO Operative Report Fields from Free-Text Surgical Notes Using Large Language Models in Advanced Ovarian Cancer

Key Points

The aim is to assess whether large language models can extract specific disease involvement details for ESGO reports in advanced ovarian cancer.
Retrospective collection of 300 operative notes from a tertiary ESGO-accredited center.
Identification of disease involvement across 35 predefined ESGO anatomical sites using LLMs.
Comparison of LLM accuracy with expert annotations calculated using F1 scores.
Implementation of optimization strategies to enhance model performance.
Top models achieved F1-scores of 0.851 and 0.864 prior to optimization.
After enhancements, accuracy improved to 0.897 and 0.875.
Highest performance noted for clinical key sites like omentum and ovaries, but lower accuracy for complex sites like bowel.
Optimization reduced common errors related to laterality and ambiguous terms.

Abstract

AbstractObjective To determine whether large language models (LLMs) can automatically extract organ-level disease involvement to populate the Surgical Findings section of the European Society of Gynaecological Oncology (ESGO) Operative Report for advanced ovarian cancer. Methods We retrospectively collected 300 operative notes from cytoreductive surgeries performed at a tertiary ESGO-accredited center. Each note was interrogated to identify disease involvement across 35 predefined ESGO anatomical sites. For each site, LLMs were tasked to classify whether disease was present. Their accuracy was compared with expert annotations using F1 scores. Four modern models were selected based on their state-of-the-art performance and suitability for clinical text interpretation. Operative notes were converted into sets of binary (yes/no) questions corresponding to each anatomical sites. Models were tested both in their basic form and after targeted enhancement strategies to reduce common errors. These enhancements included adding a clinical terminology list, providing clearer task instructions, and showing a small number of examples. Results The models showed good baseline accuracy, with the two top performing systems achieving F1-scores of 0.851 (95% CI: 0.841–0.861) and 0.864 (95% CI: 0.854–0.873). Following optimization strategies, accuracy increased further, reaching 0.897 (95% CI: 0.888–0.906) and 0.875 (95% CI: 0.866–0.884). Performance was highest for clinical key sites, including omentum, right diaphragm (95%), and ovaries (92%). Lower accuracy was observed for complex anatomical sites such as bowel (small 73%, large 61%) and peritoneal sites (pouch of Douglas 82%, abdominal wall 68%). Frequent errors involved laterality, overlapping anatomical regions, and ambiguous abbreviations. Optimization strategies improved distinction between closely related sites (rectosigmoid vs large bowel/mesentery) and reduced left/right errors. Conclusion With enhancement strategies, LLMs demonstrated near-human performance in extracting ESGO-compliant operative information. Integrating model-assisted extraction into surgical workflows may reduce reporting time, improve completeness, and help standardize operative documentation.

Bookmark

View Full Paper

Bookmark

View Full Paper

Automated Extraction of ESGO Operative Report Fields from Free-Text Surgical Notes Using Large Language Models in Advanced Ovarian Cancer

Key Points

Abstract

Cite This Study