What type of study is this?

This is a Quantitative Study study.

October 17, 2025Open Access

Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs

TATevin Atwal CTChan Nam Tieu YYYefeng YuanNanchang University

Key Points

Diverse and privacy-preserving synthetic data generation remains challenging for large language models.
Evaluation metrics reveal significant limitations in linguistic diversity and re-identification risk in generated data.
The proposed prompt-based approach aims to enhance synthetic review diversity while protecting user privacy.
Findings highlight the need for comprehensive evaluation to ensure reliable data generation in various applications.

Abstract

The increasing use of synthetic data generated by Large Language Models (LLMs) presents both opportunities and challenges in data-driven applications. While synthetic data provides a cost-effective, scalable alternative to real-world data to facilitate model training, its diversity and privacy risks remain underexplored. Focusing on text-based synthetic data, we propose a comprehensive set of metrics to quantitatively assess the diversity (i.e., linguistic expression, sentiment, and user perspective), and privacy (i.e., re-identification risk and stylistic outliers) of synthetic datasets generated by several state-of-the-art LLMs. Experiment results reveal significant limitations in LLMs' capabilities in generating diverse and privacy-preserving synthetic data. Guided by the evaluation results, a prompt-based approach is proposed to enhance the diversity of synthetic reviews while preserving reviewer privacy.

Ask AI

Helpful

Bookmark

View Full Paper