January 1, 2007

Unsupervised models for morpheme segmentation and morphology learning

Key Points

Key points are not available for this paper at this time.

Abstract

We present a model family called Morfessor for the unsupervised induction of a simple morphology from raw text data. The model is formulated in a probabilistic maximum a posteriori framework. Morfessor can handle highly inflecting and compounding languages where words can consist of lengthy sequences of morphemes. A lexicon of word segments, called morphs , is induced from the data. The lexicon stores information about both the usage and form of the morphs. Several instances of the model are evaluated quantitatively in a morpheme segmentation task on different sized sets of Finnish as well as English data. Morfessor is shown to perform very well compared to a widely known benchmark algorithm, in particular on Finnish data.

Unsupervised models for morpheme segmentation and morphology learning

Key Points

Abstract

Cite This Study

Also Consider

Also Consider