Finding Similar Sentences across Multiple Languages in Wikipedia

Key Points

Key points are not available for this paper at this time.

Abstract

We investigate whether the Wikipedia corpus is amenable to multilingual analysis that aims at generating parallel corpora. We present the results of the application of two simple heuristics for the identification of similar text across multiple languages in Wikipedia. Despite the simplicity of the methods, evaluation carried out on a sample of Wikipedia pages shows encouraging results.

Bookmark

Cite This Study

Adafre et al. (Sun,) studied this question.

synapsesocial.com/papers/6a21efb089ae9bae15e21709 https://doi.org/https://doi.org/10.1097/00006982-200310000-00028

Bookmark