Entity matching (EM) aims to identify records from different data sources referring to the same real‐world entity. Despite remarkable advances with pretrained language models (PLMs), existing PLM‐based matchers still encounter significant challenges in effectively integrating external knowledge, representing semantic information at multiple granularities, and handling numerical snippets. To address these challenges, we propose a multigranularity information‐enhanced EM method based on collaborative agents (MIEM‐CA), featuring three key components: (1) a multiagent information enhancement module (MI) that leverages extensive external knowledge, the decision‐making and collaboration capabilities of autonomous agents, and the semantic comprehension power of large language models (LLMs), by integrating attribute selection, web search, and feature extraction agents to improve the completeness of entity representation; (2) a multigranularity semantic encoder (ME) that incrementally captures and integrates token‐, attribute‐, and entity‐level semantics, along with their cross‐level correlations, across hierarchical representations spanning the token, attribute, and entity layers (ELs); and (3) a numerical‐aware agent module (NA) that employs the chain‐of‐thought (CoT) strategy to extract numerical information effectively, leverages LLMs to infer the semantic types of these numerical values, and calculates their semantic‐aware numerical similarity. Comprehensive experiments on 10 benchmark datasets, which cover structured, dirty, and textual EM settings, demonstrate that, compared with five baseline methods, MIEM‐CA achieves an average F 1 score improvement of 6.35% on structured datasets, 9.07% on the dirty datasets, and 8.11% across all datasets.
Xu et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: