Key points are not available for this paper at this time.
In this paper, we propose an end-to-end trainable regression approach for pose estimation from still images. We use the proposed Soft-argmax to convert feature maps directly to joint coordinates, resulting in a differentiable framework. Our method is able to learn heat maps indirectly, without additional steps of artificial ground truth. Consequently, contextual information can be included to the pose in a seamless way. We evaluated our method on two very challenging, the Leeds Sports Poses (LSP) and the MPII Human Pose datasets, the best performance among all the existing regression methods and results to the state-of-the-art detection based approaches.
Luvizon et al. (Fri,) studied this question.