August 7, 2025Open Access

Evaluation of Pseudonymization of Japanese Progress Notes by LLM

Key Points

The tool effectively pseudonymizes Japanese clinical text using a large language model-based named entity recognition approach.
It achieved a precision of 0.672, a recall of 0.995, and an F-score of 0.802 in initial evaluations.
The system utilizes the Presidio framework and GiNZA BERT model tailored for named entity recognition tasks.
Improvements in accuracy may be realized through further training with medical-specific data and morphological analysis techniques.

Abstract

This study developed and preliminarily evaluated a prototype tool that applies a large language model-based named entity recognition (NER) approach to pseudonymize Japanese clinical text. Japanese is an agglutinative language. The system was built using the Presidio framework with the GiNZA BERT model, which was trained for NER tasks. The approach achieved a precision of 0.672, a recall of 0.995, and an F-score of 0.802. The findings demonstrate the potential for further accuracy improvements through NER task training with a medical domain-specific LLM and rule-based processing that incorporates morphological analysis.

Evaluation of Pseudonymization of Japanese Progress Notes by LLM

Key Points

Abstract

Cite This Study