Behaviour anomaly detection in real-world environments faces critical challenges, including environmental noise, occlusions, sensor heterogeneity, and multimodality, that severely limit detection accuracy and robustness in practical deployments. Existing multi-modal approaches fail to effectively transfer knowledge across diverse data sources while lacking resilience against adversarial perturbations and distribution shifts encountered in dynamic real-world scenarios. Thus, the research introduces the Anomaly Detection Using a Multi-modal Attention-Based Knowledge Distillation (AMAKD) framework, which integrates adversarial training with attention-based cross-modal knowledge distillation to achieve enhanced detection robustness and computational efficiency. The AMAKD employs specialized encoders to extract hierarchical representations from heterogeneous modalities, including RGB, thermal imagery, and skeletal pose sequences. Subsequently, a novel cross-modal attention mechanism dynamically calibrates feature importance across modalities, facilitating selective knowledge transfer while suppressing modality-specific noise. Adversarial training is systematically incorporated through perturbation-based augmentation to enhance model invariance against environmental variations and malicious attacks. Knowledge distillation enables efficient representation transfer from an ensemble teacher network to a compact student architecture, achieving computational reduction without sacrificing accuracy. Temporal consistency constraints enforce smoothness across sequential predictions, mitigating false alarm rates. A comprehensive evaluation on benchmark datasets demonstrates that AMAKD improves AUC by + 2.7%, achieves 97.5% mean detection accuracy, and gains + 3.2% in adversarial robustness metrics, with a + 5.8% Temporal-F1 score, thereby establishing its efficacy for deployment in safety-critical surveillance applications. • AMAKD integrates cross-modal attention and knowledge distillation for robustness. • Achieves 97.5% accuracy and + 3.2% gain in adversarial robustness on benchmarks. • Multi-teacher distillation enhances efficiency with 56.5% FLOPs reduction. • Adversarial training ensures resilience under environmental and attack variations. • Real-time performance: 29.4 FPS with 34 ms latency on edge devices.
Maram Fahaad Almufareh (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: