Key points are not available for this paper at this time.
Conventional methods address the cross-modal retrieval problem by projecting the multi-modal data into a shared representation space. Such a strategy will inevitably lose the modality-specific information, leading to decreased retrieval accuracy. In this paper, we propose heterogeneous graph embeddings to preserve more abundant cross-modal information. The embedding from one modality will be compensated with the aggregated embeddings from the other modality. In particular, a self-denoising tree search is designed to reduce the "label noise" problem, making the heterogeneous neighborhood more semantically relevant. The dual-path aggregation tackles the "modality imbalance" problem, giving each sample comprehensive dual-modality information. The final heterogeneous graph embedding is obtained by feeding the aggregated dual-modality features to the cross-modal self-attention module. Experiments conducted on cross-modality person re-identification and image-text retrieval task validate the superiority and generality of the proposed method.
Chen et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: