In human pose estimation, a comprehensive evaluation of state-of-the-art frameworks is necessary to advance both research and practical applications. This paper presents a thorough review of state-of-the-art 2D and 3D human pose estimation frameworks, analyzing 118 papers and four GitHub repositories, with a focus on frameworks made since 2019. The following frameworks are chosen based on predefined inclusion criteria: AlphaPose, Detectron2, MediaPipe, MeTRAbs, MHFormer, MMPose, MoveNet, OpenPifPaf, OpenPifPaf-vita, OpenPose, PoseFormerV2, rtmlib, StridedTransformer-Pose3D, ultralytics (YOLOv8), ViTPose, and YOLOv7. This paper evaluates these 16 frameworks on an existing, unpublished dataset consisting of exercise videos recorded with a monocular RGB camera and synchronized gold-standard motion capture data. The dataset includes videos of nine individuals performing eight exercises, recorded from two camera views with different planar angles. The analysis evaluates joint angle performance of the frameworks using weighted mean absolute error and weighted intraclass correlation coefficient as quantitative metrics. MeTRAbs emerged as the best overall framework, while AlphaPose, rtmlib, and YOLOv7 were the top 2D performers.
Kahl et al. (Thu,) studied this question.