This study developed and preliminarily evaluated a prototype tool that applies a large language model-based named entity recognition (NER) approach to pseudonymize Japanese clinical text. Japanese is an agglutinative language. The system was built using the Presidio framework with the GiNZA BERT model, which was trained for NER tasks. The approach achieved a precision of 0.672, a recall of 0.995, and an F-score of 0.802. The findings demonstrate the potential for further accuracy improvements through NER task training with a medical domain-specific LLM and rule-based processing that incorporates morphological analysis.
Kimura et al. (Thu,) studied this question.