Key points are not available for this paper at this time.
Human pose estimation (HPE) is a significant area of study within computer vision and artificial intelligence, which locates the position and pose of human joints through images or videos. HPE has advanced progress in computer vision and deep learning fields at the theoretical level, improved the efficiency and accuracy of neural networks, and supported the development of natural and efficient human-computer interaction systems. At the application level, HPE has extensive value in many fields, including security monitoring, motion analysis and health management, VR&AR, robotics technology, film and television production and animation, intelligent driving and assisted driving. This paper divides 2D HPE tasks into three categories: two-stage (top-down and bottom-up), single-stage, and other improvement strategies (Transformer-based methods and post-processing optimization methods). Although deep learning-based pose estimation algorithms have achieved remarkable results, they still face challenges, such as prediction in complex scenes, crowd and joint occlusion problems, and lightweight design of models. Future work should pay more attention to these challenges and extend pose estimation technology from 2D to 3D contexts.
Haoyu Liu (Fri,) studied this question.