What question did this study set out to answer?

The aim is to develop a system that automatically extracts TNM classification from radiology reports to assist clinicians.

April 1, 2026Open Access

NURad at the NTCIR-18 RadNLP Task

Key Points

The aim is to develop a system that automatically extracts TNM classification from radiology reports to assist clinicians.
Participated in the NTCIR-18 RadNLP task
Explored prompts and various large language models including Llama3 and Google Notebook LM
Utilized both Japanese and English datasets for training
Evaluated model performance using fine-tuning techniques
Achieved a TNM (fine) score of 0.93 on validation dataset
Observed a score decrease to 0.54 on the test dataset
Decline was particularly notable in T classification compared to N and M classifications

Abstract

Lung cancer is the most common cause of cancer death in Japan. The TNM classification is essential for lung cancer diagnosis and treatment planning, and CT imaging plays a crucial role in its evaluation. However, the number of thoracic radiologists is limited in Japan. The development of a system to automatically extract TNM classification from radiology reports would be beneficial to radiologists and other clinicians. Large language models (LLMs) have recently shown remarkable progress in natural language processing, opening new possibilities for medical applications. The NURad team participated in the NTCIR-18 Natural Language Processing for Radiology (RadNLP) task . This paper describes our approach to the problem and discusses the official results. We explored different prompts, LLM models (Llama3, Open AI O1pro, Google Gemini 2.0, Google Notebook LM), and data types (Japanese and English). We also investigated fine-tuning with clinical data. The final model, utilizing a short prompt and trained on both Japanese and English datasets using Google Notebook LM, did not incorporate clinical data. Our final model with Google Notebook LM achieved a TNM (fine) score of 0.93 on the validation dataset. However, the score decreased to 0.54 on the test dataset. This decline was more pronounced for the T classification compared to the N and M classifications. This study demonstrates the potential of LLMs for automated TNM classification from radiology reports, but also highlights challenges in generalization to unseen data, particularly for T classification. Further research is needed to improve the robustness and accuracy of LLM-based TNM classification systems.

Read Full Paperexternally

Perguntar à IA

Bookmark

View Full Paper