What question did this study set out to answer?

The aim is to systematically compare the accuracy and usability of inertial measurement units and vision-based motion capture systems under similar conditions.

May 2, 2026

Comparative Evaluation of Inertial Measurement Unit versus Vision Based Motion Capture

Key Points

The aim is to systematically compare the accuracy and usability of inertial measurement units and vision-based motion capture systems under similar conditions.
Benchmarking spatiotemporal accuracy, robustness, and usability of both systems across motor tasks.
Joint angles reconstructed using a complementary-filter fusion for inertial measurement units.
Comparison included vision-based methods using convolutional layers and depth-based Kinect cameras.
Inertial measurement units estimated lower-limb joint angles with minimal drift over time but showed gradual drift in prolonged recordings without magnetic updates.
Vision-based systems had larger deviations in joint angle estimates, particularly during out-of-plane motion and brief self-occlusions.
A streamlined calibration routine improved the setup time for inertial measurement units, while vision systems were found to be less expensive but computationally demanding.

Abstract

Accurate motion capture is critical for biomechanics research, clinical gait analysis, human–computer interaction, and the entertainment industry. Two dominant paradigms exist: marker-based systems, which use inertial measurement units affixed to the body, and marker-less, vision-based systems, which infer kinematics directly from camera data using deep-learning techniques. While both have matured rapidly, a systematic, head-to-head evaluation under identical experimental conditions is still lacking. This study benchmarks the spatiotemporal accuracy, robustness, and practical usability of state-of-the-art inertial measurement units and vision-based pipelines across a representative set of motor tasks (walking, running, squatting, and upper-limb reaches) performed by 5 healthy adults. Joint angles from inertial measurement units were reconstructed via a complementary-filter fusion of accelerometer, gyroscope, and magnetometer signals. Vision-based approaches, specifically the most advanced methods, rely on cameras that capture images in red, green, and blue color channels and employ a series of convolutional layers for reconstruction. Additionally, we have also included double, depth-based Kinect version two cameras for comparison. Results indicated that both the inertial measurement unit and vision-based pipelines estimated lower-limb joint angles with generally acceptable differences, yet the vision approach showed larger deviations, especially during out-of-plane motion and brief self-occlusions. Inertial measurement unit, by contrast, remained more consistent in those scenarios but exhibited gradual drift during prolonged recordings when magnetic updates were absent. Although the camera-only system offered quicker setup and greater participant comfort, a streamlined calibration routine narrowed the preparation gap for inertial measurement units. Vision hardware was less expensive, but its higher computational demands offset that advantage. Taken together, the study highlights a trade-off: marker-less vision prioritizes plug-and-play usability, whereas inertial measurement units deliver steadier, higher-precision tracking.

Bookmark

Cite This Study

Minas Aslanyan (Mon,) studied this question.

synapsesocial.com/papers/69f593f271405d493affec2e https://doi.org/https://doi.org/10.1134/s1054661825700762

Bookmark