What question did this study set out to answer?

The aim is to enhance cross-modal retrieval accuracy when dealing with partially mismatched data pairs.

March 1, 2026

Exploring Hierarchical Cross-Modal Correlation Consistency for Partial Mismatching

Key Points

The aim is to enhance cross-modal retrieval accuracy when dealing with partially mismatched data pairs.
Proposed EH3C approach leverages neighborhood correlation distributions for optimizing alignment.
Measured soft matching degrees between cross-modal data pairs without relying on ideal distribution assumptions.
Enhanced inter-class separability through exploiting negative correlations among reliable negative sample pairs.
Conducted extensive experiments across three benchmark datasets to evaluate performance.
EH3C significantly improves retrieval performance in scenarios with partial mismatches compared to traditional methods.
Demonstrates effective learning of positive correlations amidst semantically inconsistent data pairs.

Abstract

Cross-modal retrieval facilitates more flexible information access and improves semantic understanding across different modalities. However, traditional cross-modal retrieval models rely on well-aligned datasets, which are often labor-intensive and costly to obtain. In real-world applications, data inevitably includes mismatched pairs, and these semantically inconsistent pairs can significantly degrade retrieval performance. Previous approaches have assumed ideal loss value distributions to optimize models for accurate semantic matching through soft-label estimation. However, the absence of hierarchical semantic correlation learning limits the effectiveness of these models in scenarios involving partial mismatches. To address these challenges, we propose Exploring Hierarchical Cross-Modal Correlation Consistency (EH3C) for cross-modal retrieval under partially mismatched conditions. Specifically, our approach first leverages neighborhood correlation distributions among samples to optimize cross-modal alignment, without assuming ideal distributions. This allows for the measurement of soft matching degrees between cross-modal data pairs and facilitates the effective learning of their positive correlations. Next, we enhance inter-class separability through intra-modal correlation learning by exploiting negative correlations between reliable negative sample pairs, thus enabling a more comprehensive exploration of cross-modal correlations. Finally, to assess the effectiveness and robustness of our approach, we conducted extensive experiments on three benchmark datasets. The results demonstrate that the proposed EH3C significantly improves cross-modal retrieval performance in scenarios involving partial mismatches.

اسأل الذكاء الاصطناعي

Bookmark

اسأل الذكاء الاصطناعي

Bookmark

Exploring Hierarchical Cross-Modal Correlation Consistency for Partial Mismatching

Key Points

Abstract

Cite This Study