Mind manipulation in dialogue exploits emotional vulnerabilities via covert tactics and hidden goals. We present a fine-grained framework that (i) detects turn-level manipulation tactics and (ii) infers dialogue-level intent before issuing a final manipulation judgment. To support training and evaluation, we curate a balanced dataset of 776 dialogues (388 manipulative, 388 non-manipulative; 15520 dialogue turns) generated via a three-phase multi-agent simulation with dual verification. On this benchmark, our model achieves 84.01% accuracy for dialogue-level manipulation detection, outperforming the strongest baseline by +6.83%. It further attains 76.25% accuracy on tactic detection and 79.53% BERTScore for intent prediction. A user study on forward-simulatability shows +11.0% accuracy improvement when the tactics and intent detected by our model are provided as rationale for dialogue manipulation detection results. These results indicate that explicit, stepwise reasoning over tactics and intent yields both higher performance and actionable interpretability for proactive monitoring of manipulative conversations.
Dai et al. (Tue,) studied this question.