July 23, 2025

LLMs in the Lab: Can AI Predict What Real Participants Do?

Key Points

LLMs can replicate the effect direction and significance found in real human data from randomized controlled trials.
The study analyzed multiple LLMs, including ChatGPT and Gemini, across various experimental domains.
Findings indicate that LLM-simulated datasets provide insights for refining study designs and enhancing research robustness.
This approach offers a collaborative tool, complementing traditional empirical studies without replacing them.

Abstract

Can large language models (LLMs) simulate participant-level datasets from experimental designs such that their statistical properties, such as effect directions, magnitudes, and significance, align with those of actual human data? In this work, we tested whether LLMs can generate simulated datasets that reproduce the core findings of real randomized controlled trials (RCTs) using only the information provided in a study’s pre-registration. We assessed whether this alignment generalizes across different LLMs (ChatGPT, Gemini, Perplexity) and across distinct experimental domains, including a math reasoning task comparing student performance and a social judgment task. We found that LLM-simulated datasets mirrored the real data in effect direction and successfully recovered the original patterns of statistical significance. While LLMs cannot replace empirical studies, our study offer a powerful and flexible complement capable of accelerating idea testing, refining study designs, and probing the robustness of research findings before conducting real-world experiments.

Mark Helpful

Bookmark

Relay