What question did this study set out to answer?

This research aims to address inconsistencies in cross-modal hashing retrieval through fine-grained disentanglement and alignment.

March 1, 2026Open Access

Fine-Grained Disentanglement for Alleviating Inconsistencies in Cross-Modal Hashing Retrieval

Key Points

This research aims to address inconsistencies in cross-modal hashing retrieval through fine-grained disentanglement and alignment.
Developed the Inconsistency Alleviated Fine-Grained (IAFG) framework.
Introduced Semantic Component Disentanglement (SCD) for modality-common and modality-unique information separation.
Implemented Fine-grained Semantic Alignment (FSA) for accurate cross-modal alignment at the component level.
Conducted extensive experiments on benchmark datasets to evaluate performance.
Achieved up to 6% accuracy improvement over coarse-grained methods.
Demonstrated state-of-the-art performance in retrieval accuracy across different modalities.
Confirmed the significance of fine-grained semantic components in enhancing retrieval outcomes.

Abstract

Cross-modal hashing retrieval faces fundamental challenges from modality-modality (M-M) and modality-label (M-L) inconsistencies inherent in multimodal data. Existing methods rely on coarse-grained disentanglement to address these inconsistencies, but suffer from inaccurate semantic separation and modality-common semantic information loss during cross-modal alignment. Through comprehensive analysis, we demonstrate that coarse-grained approaches fail to effectively alleviate modality inconsistencies. Our validation experiments show that incorporating fine-grained features yields up to 6\% accuracy improvements over coarse-grained methods, confirming that fine-grained semantic components are critical for robust cross-modal retrieval. However, existing fine-grained methods require extensive pre-training and lack seamless integration into end-to-end frameworks. In this paper, we propose Inconsistency Alleviated Fine-Grained (IAFG) cross-modal hashing retrieval, a novel framework that enables semantic component-level disentanglement and alignment without extensive pre-training. Our approach introduces two key innovations: Semantic Component Disentanglement (SCD) that achieves fine-grained separation of modality-common and modality-unique information using learnable query vectors and competitive feature routing, and Fine-grained Semantic Alignment (FSA) that realizes accurate cross-modal alignment at the component level while preserving semantic details through component-level cross-attention and cross-modal triplet alignment. Extensive experiments on benchmark datasets demonstrate that our method achieves state-of-the-art performance with significant improvements in retrieval accuracy across different modalities.

Bookmark

View Full Paper

Cite This Study

Li et al. (Fri,) studied this question.

synapsesocial.com/papers/69a3d79dec16d51705d2de67 https://doi.org/https://doi.org/10.1007/s41019-025-00330-w

Bookmark

View Full Paper