Key points are not available for this paper at this time.
Existing human object interaction (HOI) detection methods have introduced zero-shot learning techniques to recognize unseen interactions, but they still have limitations in understanding context information and comprehensive reasoning. To overcome these limitations, we propose a novel HOI learning framework, ContextHOI, which serves as an effective contextual HOI detector to enhance contextual understanding and zero-shot reasoning ability. The main contributions of the proposed ContextHOI are a novel context-mining decoder and a powerful interaction reasoning large language model (LLM). The context-mining decoder aims to extract linguistic contextual information from a pre-trained vision-language model. Based on the extracted context information, the proposed interaction reasoning LLM further enhances the zero-shot reasoning ability by leveraging rich linguistic knowledge. Extensive evaluation demonstrates that our proposed framework outperforms existing zero-shot methods on the HICO-DET and SWIG-HOI datasets, as high as 19.34% mAP on unseen interaction can be achieved.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jianjun Gao
Kim–Hui Yap
Kejun Wu
Nanyang Technological University
Building similarity graph...
Analyzing shared references across papers
Loading...
Gao et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e7388db6db6435876b1d07 — DOI: https://doi.org/10.1109/icassp48485.2024.10447511