March 18, 2024Open Access

Contextual Human Object Interaction Understanding from Pre-Trained Large Language Model

Key Points

Key points are not available for this paper at this time.

Abstract

Existing human object interaction (HOI) detection methods have introduced zero-shot learning techniques to recognize unseen interactions, but they still have limitations in understanding context information and comprehensive reasoning. To overcome these limitations, we propose a novel HOI learning framework, ContextHOI, which serves as an effective contextual HOI detector to enhance contextual understanding and zero-shot reasoning ability. The main contributions of the proposed ContextHOI are a novel context-mining decoder and a powerful interaction reasoning large language model (LLM). The context-mining decoder aims to extract linguistic contextual information from a pre-trained vision-language model. Based on the extracted context information, the proposed interaction reasoning LLM further enhances the zero-shot reasoning ability by leveraging rich linguistic knowledge. Extensive evaluation demonstrates that our proposed framework outperforms existing zero-shot methods on the HICO-DET and SWIG-HOI datasets, as high as 19.34% mAP on unseen interaction can be achieved.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jianjun Gao

Kim–Hui Yap

Kejun Wu

Actions

Institutions

Nanyang Technological University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Contextual Human Object Interaction Understanding from Pre-Trained Large Language Model

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study