What question did this study set out to answer?

This research aims to improve human pose estimation by reducing computational costs while maintaining performance.

June 17, 2026Open Access

Enhanced human pose estimation via self-distilled and token-pruned transformer

Key Points

This research aims to improve human pose estimation by reducing computational costs while maintaining performance.
Introduced SPTPose, a method utilizing self-distillation and token pruning.
Evaluated the performance on the MSCOCO validation set with a comparison to traditional CNN models.
Addressed computational efficiency by reducing model size and resource requirements.
SPTPose-B achieved a mean Average Precision (mAP) of 74.8% on the MSCOCO validation set.
Used only 13.2 million parameters and 4.7 GFLOPs, demonstrating significant efficiency gains.

Abstract

Human pose estimation (HPE) is a fundamental challenge in computer vision, aiming to detect anatomical keypoints in images. Traditional methods rely on CNN models, but recent advancements in Vision Transformer (ViT) models have shown superior performance. However, ViTs often require substantial computational resources. This paper introduces SPTPose, a method that employs self-distillation and token pruning to reduce computational costs while maintaining high performance. Our SPTPose-B achieves a mAP of 74.8% on the MSCOCO validation set with only 13.2 million parameters and 4.7 GFLOPs. The source code is available at https://github.com/duduxx123/SPTPose.

Bookmark

View Full Paper

Bookmark

View Full Paper

Enhanced human pose estimation via self-distilled and token-pruned transformer

Key Points

Abstract

Cite This Study