What question did this study set out to answer?

To propose a framework that analyzes the relationship between the utility and privacy risks of synthetic text generated by large language models.

June 8, 2026Open Access

View Full Paper

A Unified Evaluation Framework for Utility and Privacy Risks of LLM-Generated Synthetic Text Data

LILubana IsaogluIstanbul University-Cerrahpaşa ZOZeynep OrmanIstanbul University-Cerrahpaşa

Key Points

To propose a framework that analyzes the relationship between the utility and privacy risks of synthetic text generated by large language models.
Developed a unified evaluation framework combining utility metrics and privacy attacks in a single experimental pipeline.
Evaluated the framework using GPT-2 fine-tuned on AG News and PubMed datasets.
Measured semantic fidelity, distributional alignment, memorization behavior, and membership inference vulnerability.
On AG News, fine-tuning increases BERTScore from baseline to 0.81 and membership inference ROC-AUC from 0.45 to 0.64.
Similar trends observed with PubMed, indicating improved semantic fidelity and increased vulnerability to memorization and membership inference.
Canary exposure analysis revealed memorization of rare sequences post fine-tuning.

Abstract

The increasing use of Large Language Models (LLMs) has enabled the generation of high-quality synthetic text, providing a potential alternative to sensitive real-world datasets in domains where privacy concerns limit data sharing. However, synthetic text is not inherently privacy safe. Fine-tuning generative models on domain-specific data can enhance semantic fidelity while simultaneously increasing the risk of memorization and information leaks. In this work, we propose a unified evaluation framework to systematically analyze the connection between utility and privacy risk in LLM-generated synthetic text. Our framework combines semantic utility metrics and practical privacy attacks within a single, controlled pipeline. The key novelty of the proposed framework is its joint evaluation of utility and privacy within a single experimental pipeline. Unlike prior studies that often assess text quality and privacy risk separately, our framework jointly measures semantic fidelity, distributional alignment, memorization behavior, and membership inference vulnerability under the same controlled protocol, enabling direct analysis of the utility–privacy trade-off in synthetic text generation. We empirically evaluate the framework using GPT-2 fine-tuned on two datasets: AG News as a general-domain benchmark and PubMed abstracts as a biomedical-domain validation dataset. Results show that fine-tuning improves semantic utility but also increases empirical privacy risk. On AG News, BERTScore increases to 0.81, while membership inference ROC-AUC rises from 0.45 to 0.64. The PubMed experiment shows the same directional trend, with improved semantic fidelity accompanied by higher canary memorization and membership inference vulnerability. Additionally, canary exposure analysis indicates clear memorization of rare sequences after fine-tuning. These findings demonstrate a measurable trade-off between utility and privacy in synthetic text generation and highlight the importance of jointly evaluating both dimensions. The proposed framework provides a reproducible methodology for assessing the privacy risks of high-quality synthetic text and supports more responsible deployment of LLM-based synthetic data systems.

AI से पूछें

Bookmark

View Full Paper

Cite This Study

Isaoglu et al. (Sat,) studied this question.

synapsesocial.com/papers/6a265ca8ad53cfb9357c5e19 https://doi.org/https://doi.org/10.64808/engineeringperspective.1910777

AI से पूछें

Bookmark

View Full Paper