Two-dimensional (2D) human pose estimation is one of the key research directions in Computer Vision (CV), which has wide application prospects in behavior recognition, such as gesture tracking, intelligent monitoring, and identity recognition. Therefore, it has recently attracted extensive attention from academia and industry. However, although a large amount of literature has been published, existing reviews often lack a unified theoretical perspective and fail to capture the latest paradigm shifts brought by foundation models. To this end, this paper reviews the applications of deep learning in the domain of 2D body pose estimation from 2010 to 2025 through a cascading approach. First, the mainstream body pose datasets and related evaluation metrics are introduced in a comprehensive and convincing way through mathematical formulas. Subsequently, an in-depth analysis of the performance of the algorithms in single-person and multi-person scenarios, and a comprehensive comparative analysis of the strengths and weaknesses of each algorithmic model, are conducted. A comprehensive comparative analysis encompassing both traditional architectures and the latest deep learning breakthroughs are provided, specifically highlighting Vision Foundation Models (VFMs), generative Diffusion processes, and State Space Models (SSMs). Finally, the current state of research in the field of 2D human pose estimation is summarized, and three main challenges, emerging solutions, and expected development trends are pointed out. This survey is an exhaustive compilation of existing research in 2D human pose estimation, providing a blueprint for researchers in the field and laying the foundation for future research work.
Lin et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: