March 3, 2026Open Access

Learning What They Pretend to Think: Adversarial ToM for Safety-Critical Driving Policies

Key Points

Adversarial ToM-RL reduces collision rates by 38%, showcasing its effectiveness in enhancing vehicle safety.
Empirical results in hybrid vehicle scenarios demonstrate an 8.3% improvement in success rates when using this novel framework.
Utilizing belief-level perturbations within a POMDP allows for advanced handling of deceptive intent in driving.
Application of this framework highlights the importance of robust decision-making in complex, safety-critical environments.

Abstract

In complex driving environments, autonomous agents must interact with diverse road users who exhibit heterogeneous and often unpredictable behaviors. Traditional reinforcement learning (RL) methods struggle to maintain robust performance in the presence of adversarial or deceptive intent. We propose Adversarial Theory of Mind Reinforcement Learning (Adversarial ToM-RL), a novel framework that integrates cognitive modeling with adversarial training to improve agent resilience. Unlike prior adversarial RL that perturbs observations or dynamics, our method operates on belief-level perturbations within a partially observable Markov decision process (POMDP) to simulate deceptive intent in Theory-of-Mind reasoning. Empirical results in hybrid autonomous vehicle crossover scenarios demonstrate that Adversarial ToM-RL reduces collision rates by 38% compared to standard ToM-RL and improves success rates by 8.3%. Our method shows strong robustness against malicious behaviors such as deceptive yielding and late-blocking, maintaining low collision rates and stable performance in adversarial traffic. These findings highlight the critical role of adversarial cognitive modeling in ensuring robust decision-making for security-sensitive multi-agent systems. The framework is general, model-agnostic, and compatible with existing ToM-RL pipelines.

Learning What They Pretend to Think: Adversarial ToM for Safety-Critical Driving Policies

Key Points

Abstract

Cite This Study