What question did this study set out to answer?

To develop a framework that detects mind manipulation tactics in dialogue and infers intent for improved detection accuracy.

June 18, 2026Open Access

Guard your mind: Mind manipulation detection via multi-agent interaction and fine-grained stepwise reasoning

Key Points

To develop a framework that detects mind manipulation tactics in dialogue and infers intent for improved detection accuracy.
Curated a balanced dataset of 776 dialogues (388 manipulative, 388 non-manipulative).
Utilized a multi-agent simulation with dual verification for dialogue generation.
Conducted a user study to evaluate the effectiveness of detected manipulation tactics and intent as rationale.
Achieved 84.01% accuracy for dialogue-level manipulation detection, outperforming the baseline by +6.83%.
Attained 76.25% accuracy on tactic detection and 79.53% BERTScore for intent prediction.
Showed +11.0% accuracy improvement in dialogue manipulation detection when using detected tactics and intent as rationale.

Abstract

Mind manipulation in dialogue exploits emotional vulnerabilities via covert tactics and hidden goals. We present a fine-grained framework that (i) detects turn-level manipulation tactics and (ii) infers dialogue-level intent before issuing a final manipulation judgment. To support training and evaluation, we curate a balanced dataset of 776 dialogues (388 manipulative, 388 non-manipulative; 15520 dialogue turns) generated via a three-phase multi-agent simulation with dual verification. On this benchmark, our model achieves 84.01% accuracy for dialogue-level manipulation detection, outperforming the strongest baseline by +6.83%. It further attains 76.25% accuracy on tactic detection and 79.53% BERTScore for intent prediction. A user study on forward-simulatability shows +11.0% accuracy improvement when the tactics and intent detected by our model are provided as rationale for dialogue manipulation detection results. These results indicate that explicit, stepwise reasoning over tactics and intent yields both higher performance and actionable interpretability for proactive monitoring of manipulative conversations.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper