What question did this study set out to answer?

This research aims to validate an AI-based clinical decision support system for optimizing surgical extent in thyroid cancer.

May 29, 2026

Validation of a generative AI-based clinical decision support system for surgical extent in thyroid cancer in Brazil: Potential aid to de-escalating treatment to preserve function.

Key Points

This research aims to validate an AI-based clinical decision support system for optimizing surgical extent in thyroid cancer.
Retrospective analysis of a thyroid cohort (N=100)
Comparison of surgical recommendations by AI and senior surgeon against NCCN guidelines
Assessment metrics included Accuracy, Precision, Recall, and AUC
AI model achieved Accuracy of 0.894 and AUC of 0.895, outperforming clinician (Accuracy 0.532; AUC 0.542)
AI reached a Recall of 0.833 for TL candidates, while clinician only 0.083 (8.3%)
The AI's recommendations could potentially increase partial thyroidectomies, benefiting thyroid function preservation

Abstract

1604 Background: The decision between thyroid lobectomy (TL) and total thyroidectomy (TT) for differentiated thyroid carcinoma involves balancing oncological control with morbidity. While NCCN guidelines support TL for low-risk cases, clinical practice often favors TT due to risk aversion, exposing patients to lifelong levothyroxine dependence. We validated "Korina," a clinical decision support system based on the Gemini Large Language Model developed at Instituto do Câncer do Ceará (ICC), Brazil, to assess its ability to optimize surgical decision-making and identify candidates suitable for organ-preserving surgery. Methods: We validated the AI model using a retrospective thyroid cohort (N = 100). The median age was 47 years (IQR 38–55), 87% were female, and histology was predominantly papillary thyroid carcinoma (99%) with 1% follicular. The median reported nodule size was approximately 1.7 cm (IQR 1.2–2.4). We (three independent medical evaluators, one head neck specialist, blinded) compared the surgical recommendations of the AI and a senior head and neck surgeon against NCCN guidelines (ground truth). The primary outcome was the accuracy of the indicated surgical extent (TL vs. TT). Performance metrics included AUC, Accuracy, Precision, and Recall (Sensitivity) for identifying TL candidates. Results: In the validation analysis, the AI model demonstrated superior concordance with guidelines compared to standard clinical assessment. The AI achieved an Accuracy of 0.894 and an AUC of 0.895, significantly outperforming the clinician (Accuracy 0.532; AUC 0.542). Critical analysis revealed that the clinician had a high rate of unnecessary TT recommendations, with a Recall of only 0.083 (8.3%) for the TL class. In contrast, the AI model achieved a Recall of 0.833 (83.3%) for TL. Conclusions: The Gemini-based system (the first in Brazil to the best of our knowledge) outperformed standard clinical judgment in adhering to NCCN guidelines. The clinician demonstrated a bias toward aggressive surgery, missing over 90% of eligible lobectomy cases. Implementing this AI tool potentially could safely increase the rate of partial thyroidectomies, directly benefiting patients by preserving thyroid function and eliminating the lifelong burden of hormone replacement therapy for a significant proportion of cases. Performance metrics of AI vs. clinician (reference: NCCN). Model Precision Recall (Sensitivity) F1-Score Accuracy AUC AI (Gemini) 0.952 0.833 0.889 0.894 0.895 Clinician 1.000 0.083 0.154 0.532 0.542

Bookmark

Validation of a generative AI-based clinical decision support system for surgical extent in thyroid cancer in Brazil: Potential aid to de-escalating treatment to preserve function.

Key Points

Abstract

Cite This Study