What question did this study set out to answer?

The aim is to develop a rapid and accurate method for recognizing human motion in real-time scenarios.

January 25, 2026

YOLO11-AN: An Efficient Human Motion Recognition Method for Real-Time Applications

Key Points

The aim is to develop a rapid and accurate method for recognizing human motion in real-time scenarios.
Proposed YOLO11-AN, a lightweight detector with advanced features.
Utilized a dynamic multiscale fusion block and a dual-branch AUX head.
Implemented MPDIoU regression loss to enhance performance.
Evaluated on datasets like Pascal VOC 2012, UCF101, and HMDB51.
Achieved 0.537 mAP 50 on Pascal VOC 2012, improving by 1.7 percentage points over the YOLO11 baseline.
Maintained an inter-seed variance below 0.001.
Outperformed baselines in terms of accuracy and compute efficiency.
Sustained 15.8 FPS after INT8 quantization on a 4 GB Jetson Orin Nano.

Abstract

In high-frame-rate human–computer interaction and mobile-perception scenarios, single-frame human action recognition must meet stringent latency and accuracy constraints. To tackle spatial feature entanglement, multiscale fragmentation, and edge-deployment inefficiency, this study proposes YOLO11-AN (Action Net), a lightweight detector that couples a C3K2-DMAF dynamic multiscale fusion block, a dual-branch AUX head, an MPDIoU regression loss, and a LocalWindowAttention module. Comprehensive evaluations on Pascal VOC 2012, UCF101, and HMDB51 show that YOLO11-AN attains 0.537 mAP 50 on VOC—an absolute gain of 1.7 percentage points over the YOLO11 baseline—while maintaining an inter-seed variance below 0.001. Against peer-reviewed baselines (YOLOv8-n, PP-YOLOE-Tiny, and RT-DETR-R18), it offers the best accuracy–compute tradeoff, and after INT8 quantization sustains 15.8 FPS on a 4 GB Jetson Orin Nano, validating its suitability for real-time low-power deployments.

Bookmark

YOLO11-AN: An Efficient Human Motion Recognition Method for Real-Time Applications

Key Points

Abstract

Cite This Study