Large Language Models (LLMs) are becoming increasingly capable of persuading and even manipulating humans, with the potential to shape beliefs, behaviour, and public discourse at scale. These capabilities have been highlighted as malicious-use risks in the International AI Safety Report (Bengio et al. 2025), and increasingly impactful AI regulations now aim to assess and mitigate them while preserving potential benefits. It is therefore critical that LLM persuasiveness and related capabilities are thoroughly evaluated and well understood. To date, dozens of empirical studies have assessed LLM persuasion using human participants. Such evaluations are costly, logistically complex, hard to scale, and constrained by ethical challenges, making them impractical for the systematic evaluation of rapidly evolving LLMs. As an alternative, a growing body of work has proposed fully automated evaluation approaches that require no human involvement. In this structured review, we provide a systematic taxonomy of such automated approaches across 30 methods from 27 papers, examine how they are validated against human judgement, and discuss their limitations and risks. We find the field fragmented, and human validation limited and inconsistent: alignment with human judgement is strong for argument assessment but mixed for belief and behavioural change metrics, raising concerns about reliance on synthetic proxies for safety-critical assessment. Despite these limitations, automated methods offer value beyond fully replacing human studies, for applications such as preliminary screening and high-risk scenario testing. We conclude by outlining directions for future research in this fast-moving field, and accompany the review with a living open-access resource hub.
Dementaviciute et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: