March 3, 2026Open Access

Multimodal Neural Network Framework for Individual and Social Activity Recognition in UAV Surveillance

Key Points

The framework achieved 83.6% accuracy for individual actions, and 91.5% accuracy for social activities.
Evaluation on the Okutama-Action UAV dataset showcases its effectiveness compared to existing methods.
Integration of deep learning techniques, including graph convolutional networks, enhances activity recognition performance.
Significant implications exist for improving UAV surveillance systems and analyzing human behavior.

Abstract

Understanding human activities in complex social environments from aerial perspectives represents a critical challenge in UAV-based surveillance and autonomous systems. We propose a comprehensive multi-modal framework integrating appearance-based and skeletal feature representations for robust social activity recognition. The system employs atmospheric correction, DeepLabv3 segmentation, Mask R-CNN detection, and DeepSORT tracking for preprocessing. Feature extraction combines PDE-based shape analysis, distance transforms, and heatmap representations with skeletal features, including information landscape analysis, UMAP manifold projection, and motion signatures. Our novel Feature Correlation and Structure Fusion (FC2FS) methodology optimally integrates these heterogeneous modalities. Spatial relationships are modeled using Relational Graph Convolutional Networks with multi-head attention, while Bidirectional LSTM networks capture temporal dependencies. Maximum Entropy Markov Models enable simultaneous individual and social activity classification. Evaluation on the Okutama-Action UAV dataset achieved 83.6% accuracy for individual actions and 91.5% for social activities, while the JRDB-Act robotics dataset yielded 85.7% and 93.2% accuracy, respectively. Our framework demonstrates a 15.24 percentage point improvement over existing UAV-specific methods, establishing new performance benchmarks with computational efficiency suitable for near real-time deployment, with significant implications for surveillance systems, autonomous robotics, and human behavior analysis applications.

Multimodal Neural Network Framework for Individual and Social Activity Recognition in UAV Surveillance

Key Points

Abstract

Cite This Study