Human Action Recognition (HAR) has evolved from traditional handcrafted feature methods to modern data-driven approaches leveraging machine and deep learning. Early systems struggled with generalization in realistic conditions due to occlusions, motion complexity, and background noise. This project overcomes these restrictions by proposing a hybrid framework that merges traditional Machine Learning (ML) with advanced Deep Learning (DL) models to detect human actions from video data. Two fundamental architectures are deployed and contrasted: the Long-term Recurrent Convolutional Network (LRCN), which combines CNNs and LSTMs to capture spatial and temporal patterns, and a streamlined pose-based classifier utilizing Google's Move Net for real-time skeleton tracking. Both models are trained and evaluated on benchmark datasets UCF101 and HMDB51. Experimental results demonstrate that while LRCN achieves higher accuracy (~87.6%), the MoveNet model offers superior inference speed and robustness to noise, a making it suitable for real-time applications. The findings highlight key trade-offs between accuracy and latency, providing insights for deploying HAR systems across diverse domains such as surveillance, healthcare, and human-computer interaction.
Mrs. Subhashree D C (Tue,) studied this question.