What question did this study set out to answer?

The research aims to improve human pose estimation by addressing the spatial constraints among keypoints lost in conventional approaches.

May 3, 2026

Enhanced Query Attention Constrained by Bi-directional Graphs for Human Pose Estimation Networks.

Key Points

The research aims to improve human pose estimation by addressing the spatial constraints among keypoints lost in conventional approaches.
Developed a bidirectional graph approach to encode skeleton connection directions.
Implemented a graph convolutional network for multi-scale feature fusion.
Incorporated a dual-gate module within an attention unit to refine joint interactions.
The proposed method outperformed existing techniques in accuracy and robustness.
Significant improvements observed in local keypoint localization and global pose consistency.

Abstract

In human pose estimation, formulating keypoint localization as a classification task over discretized coordinate grids has proven effective. Essentially, the 2D features of the keypoints are reduced to 1D coordinate representations. This process leads to the loss of spatial constraints among keypoints and increases the difficulty for the model to capture their structural relationships. To address this issue, we propose an enhanced query attention mechanism constrained by bidirectional graphs. The core idea is to establish the topological constraints on the 1D coordinate representations. First, two fundamental connection directions of the skeleton are defined and encoded as a pair of adjacency matrices to enhance the feature interaction capability of the graph convolutional network (GCN). Second, a GCN-guided multi-scale feature fusion framework is designed to effectively combine multi-scale visual features with structural priors, thereby enhancing the representation of keypoint spatial distributions. Finally, a dual-gate module is incorporated into a GCN-guided attention unit to construct a structured query matrix constrained by the bidirectional skeleton graphs, which helps filter out spurious joint interactions and emphasize plausible ones. Extensive experiments on Tai Chi Chuan-Pose, Animal-Pose, AP-10K, MPII, COCO, and COCO-WholeBody datasets demonstrate that the proposed method outperforms existing methods in terms of both accuracy and robustness, particularly in balancing precise local keypoint localization with global pose consistency.

Bookmark

Cite This Study

(116183) et al. (Thu,) studied this question.

synapsesocial.com/papers/69f6e5ac8071d4f1bdfc65c7 https://doi.org/https://doi.org/10.1109/tip.2026.3687482

Bookmark