Abstract Large Language Models (LLMs) have the potential to produce content that is effective at persuading, deceiving, and manipulating people. Here we survey the possible risks of systems with these capabilities, including criminal fraud, political misinformation, addictive AI companions, and misaligned autonomous systems. We then survey the rapidly growing body of empirical work on their propensity to deceive and their capacity to persuade, which suggests that models are already roughly as persuasive as untrained human participants. We review proposed mitigations for these techniques—including training models to be truthful or monitoring their hidden states—and highlight strengths and weaknesses of each potential approach. Finally, we highlight five key open questions for future research: how persuasive could AI systems be? How do AI systems persuade? What broader social impacts could AI persuasion have? Does persuasion advance truth? And how effective are proposed mitigations?
Jones et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: