What question did this study set out to answer?

This research aims to improve multimodal named-entity recognition by addressing modality noise and enhancing cross-modal interactions.

February 16, 2026Open Access

Multimodal Named-Entity Recognition Based on Symmetric Fusion with Contrastive Learning

Key Points

This research aims to improve multimodal named-entity recognition by addressing modality noise and enhancing cross-modal interactions.
Proposed a novel model using symmetric multimodal fusion and contrastive learning.
Developed symmetric-encoder collaborative architecture with a modality refinement encoder and an aligned encoder.
Conducted experiments on two datasets to evaluate the model's performance compared to state-of-the-art methods.
Model outperforms existing state-of-the-art methods in named-entity recognition.
Ablation experiments validate the unique benefits of the symmetric encoder for consistent multimodal learning.

Abstract

Multimodal named-entity recognition (MNER) aims to identify entity information by leveraging multimodal features. With recent research shifting to multi-image scenarios, existing methods overlook modality noise and lack effective cross-modal interaction, leading to prominent semantic gaps. This study innovatively integrates symmetric multimodal fusion with contrastive learning, proposing a novel model with a symmetric-encoder collaborative architecture. To mitigate the noise, a modality refinement encoder maps each modality to an exclusive space, while an aligned encoder bridges gaps via contrastive learning in a shared space, surpassing the superficial cross-modal mapping of existing models. Building on these encoders, the symmetric fusion module achieves deep bidirectional fusion, breaking traditional one-way or concatenation-based limitations. Experiments on two datasets show the model outperforms state-of-the-art methods, with ablation experiments validating the symmetric encoder’s uniqueness for consistent multimodal learning.

Multimodal Named-Entity Recognition Based on Symmetric Fusion with Contrastive Learning

Key Points

Abstract

Cite This Study