Automated extraction of TNM staging information from radiology reports is a challenging task that requires understanding complex clinical language and applying detailed staging criteria. In this paper, we present our approach to the NTCIR-18 RadNLP 2024 shared task on automated lung cancer staging from Japanese radiology reports. We developed a hybrid system that combines large language models (LLMs) with rule-based processing in a two-stage pipeline: first extracting structured information from reports using GPT-4o models, then applying classification rules to determine the appropriate TNM stages. Our approach employed different strategies for each classification component: a rule-based method for the complex T classification and a more flexible LLM-based approach for N and M classifications. Evaluation results showed strong performance on the validation dataset (joint accuracy of 0.8148) but revealed a significant drop in T classification performance on the test dataset (from 0.8704 to 0.4769), while N and M classifications maintained high accuracy levels. This performance disparity highlights the trade-offs between rule-based precision and LLM flexibility in clinical NLP systems. Our findings suggest that balancing these approaches and leveraging larger development datasets could improve the robustness of automated cancer staging systems for real-world clinical applications.
Yamagishi et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: