Traditional travel surveys are costly, time-consuming and face declining response rates, motivating the exploration of artificial data generation methods. In this research, we propose a novel persona-driven method for generating synthetic mobility survey data using Large Language Models (LLMs). The method defines representative personas - each characterized by specific sociodemographic attributes - and prompts an LLM to emulate survey respondents with these personas. A guided prompting strategy is introduced to calibrate the synthetic data distributions so that they closely match real-world population statistics. We evaluate the approach on the German Mobilita¨t in Deutschland 2017 (MiD 2017) dataset. The quality of the LLM-PDM-generated synthetic data is assessed against ground truth using a comprehensive set of metrics, including mean absolute error (MAE), root mean square error (RMSE), Jensen-Shannon distance (JSD), entropy, conditional entropy and the Earth Mover’s Distance (EMD). Empirical results demonstrate that the LLM-PDM approach produces high-fidelity synthetic populations that preserve key distributions and relationships present in the real data. Across the case studies, the LLM-PDM method achieves low distributional errors (e.g. MAE 3%) and captures important joint patterns, significantly outperforming a number of LLM baselines.
Tzachristas et al. (Sun,) studied this question.