Abstract Social awareness is essential for effective interpersonal communication and informed decision-making, particularly in interactive and high-stakes environments. Emotional intelligence, the ability to recognize, understand, and regulate one’s own emotions while accurately interpreting and responding to the emotions of others serves as a foundational component of social competence. This is especially critical in social deduction games, where players must navigate strategic deception, manage trust, and infer intentions through nuanced psychological cues. This study explores how large language models (LLMs) can mimic emotionally intelligent behavior in the context of Blood on the Clocktower, a dialogue-based social deduction game with complex and dynamic game states. In such games, players must skillfully decide when to reveal information, bluff, or mislead others, making mastery of both the game mechanics and social dynamics essential. We task an LLM with inferring the hidden game state solely through conversation with other players and selecting ac-tions—including dialogue, game-related, and role-specific decisions—based on its evolving understanding. We use GPT-4o to generate high-quality training data and serve as a benchmark for evaluating performance. A smaller model, Mistral-7B-Instruct-v0.3, is first trained on this data and subsequently self-trained using Monte Carlo Tree Search guided interactions. We show that small LLMs can achieve competitive performance in social deduction settings by leveraging minimal but well-structured training data. Binary-branch MCTS proves sufficient for enabling models to find winning strategies. The trained Mistral-7B-Instruct-v0.3 model was able to outperform GPT-4o in our evaluation. This result suggests that reinforcement-guided retraining can provide a scalable and effective pathway for developing models that mimic emotional intelligence in social deduction games, particularly in environments where nuanced dialogue and subtle social cues render manual prompt engineering ineffective or impractical.
Poglitsch et al. (Fri,) studied this question.