What question did this study set out to answer?

The aim is to improve the understanding of object attributes and affordances through advanced modeling of their relationships.

June 6, 2026

Deciphering Object Concepts: Hierarchical Cross-Modal Relational Reasoning for Mining Object-Attribute-Affordance Associations

Key Points

The aim is to improve the understanding of object attributes and affordances through advanced modeling of their relationships.
Proposed a Hierarchical Cross-Modal Relational Reasoning (CORE) framework for object concept understanding.
Developed a coarse-to-fine relational reasoning module using multi-step learnable prompts.
Introduced a counterfactual reasoning mechanism to enhance capturing of causal relationships.
Achieved significant performance gains in object-concept mapping accuracy.
Enhanced ability to capture causality among object attributes and affordances.
Visualization analysis showed superior understanding compared to existing methods.

Abstract

Object Concept Learning (OCL) aims to recognize high-level attributes and affordances of objects and to infer the causal relationships between them. The key is to accurately model the many-to-many mapping between objects and concepts: While an object may possess multiple concepts, a concept can also belong to multiple objects. Existing methods primarily rely on attention mechanisms to capture label correlations, which limits their ability to comprehend high-level concepts and to perform effective causal reasoning. Inspired by the human cognitive process of progressive understanding, a Hierarchical Cross-Modal Relational Reasoning (CORE) framework is proposed to enhance the understanding of object concepts through hierarchical interaction and reasoning between visual and textual modalities. Specifically, a coarse-to-fine relational reasoning module is developed, in which multi-step learnable prompts are employed to progressively localize the conceptual information of objects, thereby improving the accuracy of object-concept mapping. Subsequently, to facilitate the modeling of causal relationships between object attributes and affordances, a counterfactual reasoning mechanism is introduced. By constructing counterfactual samples and distinguishing the predictive outputs of factual and counterfactual parts, the model's ability to capture causality among concepts is enhanced. Significant performance gains and extensive visualization analysis demonstrate the superiority of our method.

AI에게 질문

Bookmark

AI에게 질문

Bookmark

Deciphering Object Concepts: Hierarchical Cross-Modal Relational Reasoning for Mining Object-Attribute-Affordance Associations

Key Points

Abstract

Cite This Study