Lung cancer is the most common cause of cancer death in Japan. The TNM classification is essential for lung cancer diagnosis and treatment planning, and CT imaging plays a crucial role in its evaluation. However, the number of thoracic radiologists is limited in Japan. The development of a system to automatically extract TNM classification from radiology reports would be beneficial to radiologists and other clinicians. Large language models (LLMs) have recently shown remarkable progress in natural language processing, opening new possibilities for medical applications. The NURad team participated in the NTCIR-18 Natural Language Processing for Radiology (RadNLP) task . This paper describes our approach to the problem and discusses the official results. We explored different prompts, LLM models (Llama3, Open AI O1pro, Google Gemini 2.0, Google Notebook LM), and data types (Japanese and English). We also investigated fine-tuning with clinical data. The final model, utilizing a short prompt and trained on both Japanese and English datasets using Google Notebook LM, did not incorporate clinical data. Our final model with Google Notebook LM achieved a TNM (fine) score of 0.93 on the validation dataset. However, the score decreased to 0.54 on the test dataset. This decline was more pronounced for the T classification compared to the N and M classifications. This study demonstrates the potential of LLMs for automated TNM classification from radiology reports, but also highlights challenges in generalization to unseen data, particularly for T classification. Further research is needed to improve the robustness and accuracy of LLM-based TNM classification systems.
Higashi et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: