What question did this study set out to answer?

The aim is to develop an efficient human action recognition model to enhance human-computer interaction.

March 7, 2026Open Access

Visual intelligence for efficient human action recognition in human computers interaction applications

Key Points

The aim is to develop an efficient human action recognition model to enhance human-computer interaction.
Proposed a model using deep neural networks, combining CNNs and RNNs.
Utilized a pre-trained EfficientNetB7 for spatial feature extraction from video frames.
Employed an LSTM network for capturing long-range temporal dependencies.
Conducted experiments using UCF101 and HMDB51 datasets.
Achieved a classification accuracy of 97.8% on the UCF101 dataset.
Achieved a classification accuracy of 80.1% on the HMDB51 dataset.
Outperformed state-of-the-art human action recognition models.

Abstract

Human Action Recognition (HAR) is a pivotal area in computer vision, video surveillance, and human-computer interaction (HCI), driven by the need for efficient and accurate models to enhance HCI experiences. Traditional HAR methods often rely on hand-crafted features and shallow learning techniques, which limits their ability to capture complex patterns. In contrast, this study proposes an efficient HAR model that leverages deep neural networks, specifically a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), to enhance HCI through AI-powered action understanding. The model employs a pre-trained EfficientNetB7 network to extract rich spatial features from video frames, followed by a Long Short-Term Memory (LSTM) network to capture long-range temporal dependencies. This architecture enhances recognition accuracy while reducing computational complexity, making it highly suitable for HCI applications. Experimental results demonstrate the superior performance of the model, achieving a classification accuracy of 97.8% on the UCF101 dataset and 80.1% on the HMDB51 dataset, outperforming state-of-the-art HAR models. The proposed model eliminates the need for auxiliary assistive techniques like data augmentation, highlighting its efficiency and tremendous potential for real-world HCI applications that rely on accurate and efficient recognition of human actions.

Read Full Paperexternally

Perguntar à IA

Bookmark

View Full Paper