MODEC: Multimodal Decomposable Models for Human Pose Estimation

Key Points

Key points are not available for this paper at this time.

Abstract

We propose a multimodal, decomposable model for articulated human pose estimation in monocular images. A typical approach to this problem is to use a linear structured model, which struggles to capture the wide range of appearance present in realistic, unconstrained images. In this paper, we instead propose a model of human pose that explicitly captures a variety of pose modes. Unlike other multimodal models, our approach includes both global and local pose cues and uses a convex objective and joint training for mode selection and pose estimation. We also employ a cascaded mode selection step which controls the trade-off between speed and accuracy, yielding a 5x speedup in inference and learning. Our model outperforms state-of-the-art approaches across the accuracy-speed trade-off curve for several pose datasets. This includes our newly-collected dataset of people in movies, FLIC, which contains an order of magnitude more labeled data for training and testing than existing datasets.

Mark Helpful

Bookmark

Relay

Mark Helpful

Bookmark

Relay

MODEC: Multimodal Decomposable Models for Human Pose Estimation

Key Points

Abstract

Cite This Study