What question did this study set out to answer?

The aim is to automate TNM classification for lung cancer using Japanese radiology reports with improved interpretability.

April 1, 2026Open Access

ORAD at NTCIR-18 RadNLP 2024 Shared Task

Key Points

The aim is to automate TNM classification for lung cancer using Japanese radiology reports with improved interpretability.
Developed a system utilizing Google's Gemini 2.0 Flash Experimental for TNM classification.
Employed Chain-of-Thought and Many-Shot In-Context Learning for prompt design.
Performed Supervised Fine-Tuning with Gemma2-9b-it for local inference.
Included CoT-based examples in training for better reasoning and accuracy.
Improved inference accuracy in TNM classification due to CoT-based reasoning examples.
Demonstrated feasibility of interpretability in large language model applications.
Addressed privacy concerns with local deployment options.

Abstract

Here, we report our approach to the NTCIR-18 RadNLP2024 Shared Task (Japanese Track, Main Task). In this study, we developed a system to determine the TNM classification from lung cancer using Japanese radiology reports. Specifically, we provided Google DeepMind’s Gemini 2.0 Flash Experimental (gemini-2.0-flash-exp) with a prompt that combines Chain-of-Thought (CoT) and Many-Shot In-Context Learning (ICL), enabling automatic prediction of the T, N, and M factors for each case. Besides accuracy, interpretability is crucial in the medical domain; thus, having the model output the rationale for its TNM classification ensures a degree of transparency. Moreover, by including numerous examples of CoT-based reasoning—written by a radiologist with 5 years of dedicated experience in diagnostic radiology—to explain how the TNM classification is derived, we achieved improved inference accuracy. Furthermore, to address privacy concerns and the need for local inference without network connectivity in clinical settings, we performed Supervised Fine-Tuning (SFT) using Gemma2-9b-it, a comparatively lightweight open-source model. By providing the model with CoT-based reasoning steps leading to TNM classification as training data, we observed improved inference accuracy. These findings demonstrate that additional data and prompt strategies to support large language model (LLM)-based inference can be highly effective in automating TNM classification while also indicating the feasibility of realizing interpretability in LLM-based medical applications.

Read Full Paperexternally

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper