Here, we report our approach to the NTCIR-18 RadNLP2024 Shared Task (Japanese Track, Main Task). In this study, we developed a system to determine the TNM classification from lung cancer using Japanese radiology reports. Specifically, we provided Google DeepMind’s Gemini 2.0 Flash Experimental (gemini-2.0-flash-exp) with a prompt that combines Chain-of-Thought (CoT) and Many-Shot In-Context Learning (ICL), enabling automatic prediction of the T, N, and M factors for each case. Besides accuracy, interpretability is crucial in the medical domain; thus, having the model output the rationale for its TNM classification ensures a degree of transparency. Moreover, by including numerous examples of CoT-based reasoning—written by a radiologist with 5 years of dedicated experience in diagnostic radiology—to explain how the TNM classification is derived, we achieved improved inference accuracy. Furthermore, to address privacy concerns and the need for local inference without network connectivity in clinical settings, we performed Supervised Fine-Tuning (SFT) using Gemma2-9b-it, a comparatively lightweight open-source model. By providing the model with CoT-based reasoning steps leading to TNM classification as training data, we observed improved inference accuracy. These findings demonstrate that additional data and prompt strategies to support large language model (LLM)-based inference can be highly effective in automating TNM classification while also indicating the feasibility of realizing interpretability in LLM-based medical applications.
Keisuke Hidaka (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: