Key points are not available for this paper at this time.
The integration of product data in e-commerce poses a challenge due to the data inherent heterogeneity and decentralized distribution. Product matching has emerged as the solution to this data integration, enabling the aggregation of distinct offers for identical products. Recently, numerous proposals have emerged to address product matching. However, state-of-the-art techniques require labeled product data, which can pose a significant challenge for specific languages. This work explores Cross-Lingual Learning (CLL) techniques in product matching. CLL leverages language learning models from languages with abundant linguistic resources to transfer knowledge to low-resource languages. Specifically, we employ English product data to train models for Portuguese. We demonstrate the ability to achieve promising results in the product matching task using a reduced volume of training data in the target language. Furthermore, we surpass baseline approaches by exploring various CLL strategies analyzing different large language models (LLMs). To the best of our knowledge, this work is the first to explore multiple CLL strategies for the product matching task. Our results suggest that CLL is a promising approach to enhance the performance of product matching models in languages with limited resources.
Alves et al. (Mon,) studied this question.