May 8, 2019

Deep Learning-Based Morphological Taggers and Lemmatizers for Annotating Historical Texts

Key Points

Key points are not available for this paper at this time.

Abstract

Part-of-speech tagging, morphological tagging, and lemmatization of historical texts pose special challenges due to the high spelling variability and the lack of large, high-quality training corpora. Researchers therefore often first map the words to their modern spelling and then annotate with tools trained on modern corpora. We show in this paper that high quality part-of-speech tagging and lemmatization of historical texts is possible while operating directly on the historical spelling. We use a part-of-speech tagger based on bidirectional long short-term memory networks (LSTMs) 11 with character-based word representations and lemmatize using an encoder-decoder system with attention. We achieve state-of-the-art results for modern German morphological tagging on the Tiger corpus and also on two historical corpora which have been used in previous work.

AIに質問

Bookmark