Robust evaluation of autonomous vehicle (AV) safety and operational domain demands large-scale traffic scenarios that are simultaneously realistic and adversarial. In this work, we present a novel framework for generating adversarial yet feasible scenarios by explicitly controlling the critical background vehicles (CBVs) whose behaviours are informed by formal traffic ontologies. The proposed pipeline integrates (i) real-time AV feasibility constraints to guarantee kinematic and dynamical admissibility, and (ii) structured semantic knowledge to ensure compliance with traffic rules and social conventions, thereby producing scenarios that are adversarial, physically plausible, and semantically valid. To stabilise learning under sparse and noisy reward signals inherent to adversarial generation, we adopt a Dual-Clip Proximal Policy Optimisation (PPO) scheme with adaptive clipping bounds and curiosity-driven exploration bonuses. Extensive experiments conducted on CARLA Town05 and Town02 intersection benchmarks demonstrate that our CBV policies significantly outperform state-of-the-art baselines, including standard PPO, FPPO-RS, and FREA. Quantitatively, the collision rates are reduced to 3.5% and 4.2%, representing an approximate 30% reduction relative to the best baseline. The rule-compliance scores improve by up to 12%. In addition, OURS produces smoother and more varied interactive behaviors, indicating enhanced interaction diversity compared to baselines. Further cross-policy generalization tests with Expert and PlanT AV controllers confirm consistent improvements in collision avoidance, infeasibility reduction, and overall scenario quality.
Wang et al. (Wed,) studied this question.