What question did this study set out to answer?

The aim is to improve 3D human-object interaction learning by integrating semantic knowledge with visual data.

April 22, 2026Open Access

Semantic knowledge enhanced 3D human–object interaction learning

Key Points

The aim is to improve 3D human-object interaction learning by integrating semantic knowledge with visual data.
Proposed SKE-3DHOI framework combines semantic knowledge from large multimodal models with visual features.
Generates 3D HOI semantic knowledge tensors through specific textual queries.
Uses cross-attention fusion layers to align visual patterns with semantic knowledge.
SKE-3DHOI outperforms existing methods in 3D human-object interaction tasks across all metrics.
Significant improvements were observed in accuracy and robustness of interaction modeling.

Abstract

Learning 3D human-object interactions (HOI) from 2D images is one of the important approaches for understanding human-object interactions in 3D space and is crucial for the advancement of embodied AI and interaction modeling. Existing 3D human-object interaction learning methods often fail to model fine-grained interactions in complex scenarios due to their reliance on visual features alone, leading to ambiguities in human contact, object affordance, and spatial relation. To address this, we propose SKE-3DHOI, a semantic knowledge enhanced framework that integrates semantic knowledge derived from large multimodal models into visual 3D human-object interaction reasoning. By generating 3D HOI semantic knowledge tensors through HOI-specific textual queries of large multimodal models, our method encodes critical HOI semantics and fuses them with visual embeddings via cross-attention fusion layers. This enables explicit alignment of visual patterns with semantic knowledge priors. Extensive experiments validate that SKE-3DHOI achieves state-of-the-art performance, significantly outperforming existing methods across all metrics in 3D human-object interaction learning. The framework bridges the gap between geometric plausibility and semantic validity, advancing robust 3D HOI understanding.

Bookmark

View Full Paper

Bookmark

View Full Paper

Semantic knowledge enhanced 3D human–object interaction learning

Key Points

Abstract

Cite This Study