In complex driving environments, autonomous agents must interact with diverse road users who exhibit heterogeneous and often unpredictable behaviors. Traditional reinforcement learning (RL) methods struggle to maintain robust performance in the presence of adversarial or deceptive intent. We propose Adversarial Theory of Mind Reinforcement Learning (Adversarial ToM-RL), a novel framework that integrates cognitive modeling with adversarial training to improve agent resilience. Unlike prior adversarial RL that perturbs observations or dynamics, our method operates on belief-level perturbations within a partially observable Markov decision process (POMDP) to simulate deceptive intent in Theory-of-Mind reasoning. Empirical results in hybrid autonomous vehicle crossover scenarios demonstrate that Adversarial ToM-RL reduces collision rates by 38% compared to standard ToM-RL and improves success rates by 8.3%. Our method shows strong robustness against malicious behaviors such as deceptive yielding and late-blocking, maintaining low collision rates and stable performance in adversarial traffic. These findings highlight the critical role of adversarial cognitive modeling in ensuring robust decision-making for security-sensitive multi-agent systems. The framework is general, model-agnostic, and compatible with existing ToM-RL pipelines.
Bi et al. (Thu,) studied this question.