The management of lung cancer heavily relies on precise staging, which is traditionally derived from comprehensive radiology reports generated through imaging techniques like CT and MRI. However, these reports often lack explicit staging details, posing challenges for healthcare professionals who must manually extract relevant information. To address this issue, we propose an automated solution as part of our submission to the RadNLP (Natural Language Processing for Radiology) shared task at the NTCIR-18 international conference. Our approach utilizes tailored Natural Language Processing (NLP) techniques to enhance the processing of radiology reports. In this paper, we describe our methodology for the RadNLP subtask, which involves document segmentation to identify eight key classes within radiology reports, and the primary task, which focuses on the automated TNM staging of lung cancer. For the subtask, we employed an ensemble of three fine-tuned, hyperparameter-optimized BERT-based medical language models, which yielded an overall micro F2 score of 0.9433, securing the top rank in the competition. For the main task, we developed individual pipelines for T, N, and M staging, consisting of BERT-based models and LLMs in a multistage processing framework, resulting in a joint accuracy of 0.5679 and an overall 4th place finish in the competition. Our solution not only streamlines the extraction of critical information but also aims to improve the accuracy and efficiency of cancer staging, ultimately supporting clinical decision-making and contributing to better patient outcomes
Bhawnani et al. (Fri,) studied this question.