March 24, 2024Open Access

Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

Key Points

Key points are not available for this paper at this time.

Abstract

Human-Object Interaction (HOI) detection plays a vital role in scene understanding, which aims to predict the HOI triplet in the form of . Existing methods mainly extract multi-modal features (e.g., appearance, object semantics, human pose) and then fuse them together to directly predict HOI triplets. However, most of these methods focus on seeking for self-triplet aggregation, but ignore the potential cross-triplet dependencies, resulting in ambiguity of action prediction. In this work, we propose to explore Self- and Cross-Triplet Correlations (SCTC) for HOI detection. Specifically, we regard each triplet proposal as a graph where Human, Object represent nodes and Action indicates edge, to aggregate self-triplet correlation. Also, we try to explore cross-triplet dependencies by jointly considering instance-level, semantic-level, and layout-level relations. Besides, we leverage the CLIP model to assist our SCTC obtain interaction-aware feature by knowledge distillation, which provides useful action clues for HOI detection. Extensive experiments on HICO-DET and V-COCO datasets verify the effectiveness of our proposed SCTC.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Weibo Jiang

Weihong Ren

Jiandong Tian

Actions

Institutions

University of Hong Kong

Harbin Institute of Technology

Shenyang Institute of Automation

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study