Abstract Understanding and analyzing complex, dynamic interactions in competitive environments remains a critical challenge inintelligent visual systems. Recent advances in spatiotemporal modeling and multi-agent reasoning underscore the importanceof structured, context-aware solutions, especially in domains requiring precise interpretation of physical motion, strategicintent, and inter-agent behavior. Conventional approaches often fall short by either focusing narrowly on pose-basedfeatures or neglecting the intricate temporal and relational dependencies that govern interactive dynamics, hindering theirapplicability in environments where semantic richness and real-time responsiveness are crucial. Our method introduces theSpatio-Competitive Attention Network (SCAN), which processes agent-centric and interaction-centric features in parallel,fuses them via a competitive attention mechanism, and supports prediction through temporally augmented memory. SCANaligns low-level kinematics with high-level strategy using a hierarchical attention-guided design. We develop the AdversarialContextual Reinforcement Strategy (ACRS), a training scheme that infuses domain-aligned constraints, semantic consistency,and adversarial role regularization to promote behavioral interpretability and robustness. Extensive experimental evaluationsdemonstrate the superiority of our approach in modeling interactive dynamics, capturing competition-induced behaviors, andmaintaining semantic fidelity across complex scenarios. The proposed framework exemplifies a structured, learning-drivensolution that aligns closely with ongoing efforts to advance human-centric computing, intelligent vision systems, andinterpretable AI, emphasizing multi-agent interaction, strategic inference, and context-rich decision-making.
Zonghao Wang (Mon,) studied this question.