What question did this study set out to answer?

Investigate methods to enhance retrieval performance in cross-modal systems while addressing data asymmetries.

February 12, 2026

Adaptive Co-Operative Prompting and Uncertainty-Aware Implicit Knowledge Enhancement for Cross-Modal Retrieval

Key Points

Investigate methods to enhance retrieval performance in cross-modal systems while addressing data asymmetries.
Proposed Adaptive Co-operative Knowledge Enhancement (ACKE) method for cross-modal retrieval.
Utilized Uncertainty-Aware Inspire Potential (UAIP) to generate multi-perspective descriptions using generative LMMs.
Employed Dempster-Shafer Theory (DST) to manage and quantify semantic uncertainty in descriptions.
Developed a prompt pool for dynamic selection of instance-specific visual prompts to guide modal encoders.
Demonstrated improved accuracy in cross-modal retrieval tasks across Flickr30K and MS-COCO datasets.
Reduced semantic noise and enhanced the handling of information asymmetry in cross-modal associations.

Abstract

With the rapid growth of internet multimedia data, cross-modal retrieval techniques have garnered significant attention. Given the inherent complexity and non-intuitive nature of cross-modal relationships, tuning pre-trained Large Multimodal Models (LMMs) with cross-modal data has become a mainstream approach. However, cross-modal data commonly exhibit inter-modal information asymmetry and intra-modal distribution diversity. Faced with these challenges, existing paradigms tend to learn ambiguous and asymmetric cross-modal associations, which introduce semantic noise. In addition, their limited adaptability to the high diversity of real-world content further hinders optimal retrieval performance. To address these challenges, this paper proposes the A daptive C o-operative K nowledge E nhancement (ACKE) method, which comprises the Uncertainty-Aware Inspire Potential (UAIP) and Adaptive Co-operative Prompt (ACP) strategies. UAIP utilizes generative LMMs to generate multi-perspective descriptions that enrich semantic information, while employing Dempster-Shafer Theory (DST) to quantify their semantic uncertainty and adjust contribution weights, reducing inaccurate relational mappings and balancing information asymmetry. ACP constructs a prompt pool where instance-specific visual prompts are dynamically selected and projected into text prompts, which collaborate to guide modal encoders toward deep semantic consensus, thus mitigating alignment bias from intra-modal distribution diversity and improving accuracy. Extensive experiments are conducted on two widely used datasets, Flickr30K and MS-COCO, demonstrating the effectiveness of our proposed method. The code is available at https://github.com/nynu-BDAI/ACKE.

Perguntar à IA

Bookmark

Cite This Study

Huang et al. (Wed,) studied this question.

synapsesocial.com/papers/698d6efe5be6419ac0d5504f https://doi.org/https://doi.org/10.1145/3797043

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Perguntar à IA

Bookmark