1604 Background: The decision between thyroid lobectomy (TL) and total thyroidectomy (TT) for differentiated thyroid carcinoma involves balancing oncological control with morbidity. While NCCN guidelines support TL for low-risk cases, clinical practice often favors TT due to risk aversion, exposing patients to lifelong levothyroxine dependence. We validated "Korina," a clinical decision support system based on the Gemini Large Language Model developed at Instituto do Câncer do Ceará (ICC), Brazil, to assess its ability to optimize surgical decision-making and identify candidates suitable for organ-preserving surgery. Methods: We validated the AI model using a retrospective thyroid cohort (N = 100). The median age was 47 years (IQR 38–55), 87% were female, and histology was predominantly papillary thyroid carcinoma (99%) with 1% follicular. The median reported nodule size was approximately 1.7 cm (IQR 1.2–2.4). We (three independent medical evaluators, one head neck specialist, blinded) compared the surgical recommendations of the AI and a senior head and neck surgeon against NCCN guidelines (ground truth). The primary outcome was the accuracy of the indicated surgical extent (TL vs. TT). Performance metrics included AUC, Accuracy, Precision, and Recall (Sensitivity) for identifying TL candidates. Results: In the validation analysis, the AI model demonstrated superior concordance with guidelines compared to standard clinical assessment. The AI achieved an Accuracy of 0.894 and an AUC of 0.895, significantly outperforming the clinician (Accuracy 0.532; AUC 0.542). Critical analysis revealed that the clinician had a high rate of unnecessary TT recommendations, with a Recall of only 0.083 (8.3%) for the TL class. In contrast, the AI model achieved a Recall of 0.833 (83.3%) for TL. Conclusions: The Gemini-based system (the first in Brazil to the best of our knowledge) outperformed standard clinical judgment in adhering to NCCN guidelines. The clinician demonstrated a bias toward aggressive surgery, missing over 90% of eligible lobectomy cases. Implementing this AI tool potentially could safely increase the rate of partial thyroidectomies, directly benefiting patients by preserving thyroid function and eliminating the lifelong burden of hormone replacement therapy for a significant proportion of cases. Performance metrics of AI vs. clinician (reference: NCCN). Model Precision Recall (Sensitivity) F1-Score Accuracy AUC AI (Gemini) 0.952 0.833 0.889 0.894 0.895 Clinician 1.000 0.083 0.154 0.532 0.542
Juaçaba et al. (Wed,) studied this question.