What question did this study set out to answer?

This research aims to automate TNM classification for radiology reports using large language models.

April 1, 2026Open Access

Hirosaki team at the NTCIR-18 RadNLP2024 Shared Task: Few-Shot Learning and Prompt Engineering for TNM Staging Classification of English Radiology Reports Using Large Language Models.

Key Points

This research aims to automate TNM classification for radiology reports using large language models.
Participated in the NTCIR-18 RadNLP2024 shared task
Utilized large language models: GPT-4o-mini, GPT-4o, and o1-mini
Implemented cosine similarity for embedding-based retrieval
Applied few-shot learning techniques for improved accuracy
o1-mini achieved the highest classification accuracy
Test data accuracy declined by about 30% compared to validation data
Challenges identified in classifying the T factor related to tumor size and infiltration

Abstract

We participated in the NTCIR-18 RadNLP2024 shared task 1 and investigated the automation of TNM classification using large language models (LLMs), specifically GPT-4o-mini, GPT-4o, and o1-mini. Our approach integrates cosine similarity-based retrieval using embedding vectors and few-shot learning to enhance classification accuracy. As a result of the experiment, o1-mini achieved the highest classification accuracy. However, the accuracy on the test data declined by approximately 30% compared to the validation data. In particular, the low classification accuracy of the T factor highlighted challenges in interpreting tumor size and extent of infiltration. In this paper, we analyze these results and report our approach to this task along with official results.

Read Full Paperexternally

AIに質問

Bookmark

View Full Paper

Cite This Study

Mori et al. (Fri,) studied this question.

synapsesocial.com/papers/69cd7b475652765b073a9239 https://doi.org/https://doi.org/10.20736/0002002065

AIに質問

Bookmark

View Full Paper