What question did this study set out to answer?

This work aims to assess the effectiveness of various deep learning models for detecting personally identifiable information in unstructured text.

April 1, 2026Open Access

On the Applicability of LLMs and SLMs for Privacy-Preserving Named Entity Recognition in Financial Applications

Puntos clave

This work aims to assess the effectiveness of various deep learning models for detecting personally identifiable information in unstructured text.
Evaluated multiple language models including DistilBERT and RoBERTa.
Used performance indices such as accuracy, precision, recall, and F1-score.
Proposed a novel transformer-based architecture for improved PII recognition.
Conducted comparative analysis on the AI4Privacy PII 43 K dataset.
Some small language models performed equally to large language models on the specific dataset.
Identified strengths and limitations of existing approaches to privacy-preserving information extraction.
Demonstrated the effectiveness of SLMs in enhancing personal data detection in financial applications.

Resumen

This work explores how deep learning models, with different numbers of parameters, can be effectively applied to detect personal data within unstructured text using Named Entity Recognition (NER) techniques. We evaluate the performance of various architectures by leveraging a plethora of language models (LMs) consisting of Distilbert-base-uncased, Distilbert-base-cased, Bert-base-uncased, Bert-base-cased, Bert-large-uncased, Bert-large-cased, ModernBERT-base, ModernBERT-large, nomic-BERT-2048, RoBERTa-base, DistilRoBERTa-base, RoBERTa-large, Deberta-v3-xsmall, Deberta-v3-small, and Deberta-v3-base, which are evaluated using the performance indices of accuracy, precision, recall, and F1-score. Our experiments show that some Small Language Models (SLMs) compete equally with some corresponding LLMs (Large Language Models), based on the specific PII (Personally Identifiable Information) dataset, thus enhancing personal data detection, which is of paramount importance in financial applications. Moreover, we proposed a novel architecture based on an optimized transformer fine-tuning strategy to improve PII recognition across diverse contexts and conducted an extensive comparative analysis to evaluate the performance of our proposed architecture in relation to all relevant existing approaches reported in the literature. This evaluation, performed on the AI4Privacy PII 43 K dataset, encompasses every publicly available work we identified and provides a thorough benchmarking of our methods within the current research field. The results highlight both the strengths and limitations of existing solutions and demonstrate the effectiveness of SLMs in addressing the challenges of privacy-preserving information extraction.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo