What question did this study set out to answer?

To develop a persona-driven method using large language models for generating synthetic mobility survey data.

February 26, 2026Open Access

LLM-PDM: an LLM Persona-Driven method for replicating personal Mobility preferences at scale

Key Points

To develop a persona-driven method using large language models for generating synthetic mobility survey data.
Defined personas with specific sociodemographic attributes.
Utilized a guided prompting strategy for synthetic data generation.
Evaluated against the MiD 2017 dataset using various metrics.
Achieved low distributional errors such as mean absolute error below 3%.
Preserved key distributions and relationships in generated populations.
Outperformed several LLM baselines in synthetic data quality.

Abstract

Traditional travel surveys are costly, time-consuming and face declining response rates, motivating the exploration of artificial data generation methods. In this research, we propose a novel persona-driven method for generating synthetic mobility survey data using Large Language Models (LLMs). The method defines representative personas - each characterized by specific sociodemographic attributes - and prompts an LLM to emulate survey respondents with these personas. A guided prompting strategy is introduced to calibrate the synthetic data distributions so that they closely match real-world population statistics. We evaluate the approach on the German Mobilita¨t in Deutschland 2017 (MiD 2017) dataset. The quality of the LLM-PDM-generated synthetic data is assessed against ground truth using a comprehensive set of metrics, including mean absolute error (MAE), root mean square error (RMSE), Jensen-Shannon distance (JSD), entropy, conditional entropy and the Earth Mover’s Distance (EMD). Empirical results demonstrate that the LLM-PDM approach produces high-fidelity synthetic populations that preserve key distributions and relationships present in the real data. Across the case studies, the LLM-PDM method achieves low distributional errors (e.g. MAE 3%) and captures important joint patterns, significantly outperforming a number of LLM baselines.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper