August 14, 2025Open Access

CASF: Correlation-Alignment and Significance-Aware Fusion for Multimodal Named Entity Recognition

Key Points

CASF-MNER improves entity recognition accuracy across multimodal datasets, addressing feature alignment challenges.
Experimental results indicate superior performance with CASF-MNER on both Twitter-2015 and Twitter-2017 datasets, enhancing multimodal integration.
This approach employs a dynamic perception mechanism and entropy weighting to suppress noise and boost key feature expression.
The model leverages cross-modal attention and contrastive learning for deep semantic and feature fusion, aiming for optimal representational consistency.

Abstract

With the increasing content richness of social media platforms, Multimodal Named Entity Recognition (MNER) faces the dual challenges of heterogeneous feature fusion and accurate entity recognition. Aiming at the key problems of inconsistent distribution of textual and visual information, insufficient feature alignment and noise interference fusion, this paper proposes a multimodal named entity recognition model based on dual-stream Transformer: CASF-MNER, which designs cross-modal cross-attention based on visual and textual features, constructs a bidirectional interaction mechanism between single-layer features, forms a higher-order semantic correlation modeling, and realizes the cross relevance alignment of modal features; construct a dynamic perception mechanism of multimodal feature saliency features based on multiscale pooling method, construct an entropy weighting strategy of global feature distribution information to adaptively suppress noise redundancy and enhance key feature expression; establish a deep semantic fusion method based on hybrid isomorphic model, design a progressive cross-modal interaction structure, and combine with contrastive learning to realize global fusion of the deep semantic space and representational consistency optimization. The experimental results show that CASF-MNER achieves excellent performance on both Twitter-2015 and Twitter-2017 public datasets, which verifies the effectiveness and advancement of the method proposed in this paper.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper