Korean Part-of-Speech (POS) tagging is different from and more difficult than other languages such as English, Russian and Chinese due to raising issues of Korean word segmentation and analysis of sound-changed morphemes. In this paper we propose a transformer-based Korean POS tagging model, which combines the output vector of a encoder of the transformer with a representational vector of the input word obtained from character-level word embeddings network unlike existing deep learning-based POS tagging models based on BiLSTM. First, in order to perform segmentation of words and changed sound analysis at once, we have designed a model to make a new output sequence of the POS tagging model as a sequence of pairs of strings of morphemes and its POS tags. Second, in order to obtain character-level word representations, word embedding network employing convolution network and highway network are trained. Finally, to make more efficient use of the semantic information of the input word in generating of sequences of POS tagging, we combined the word representation vector obtained from the word-embedding generation network with the output of a encoder of the transformer. According to the experimental results, the proposed model achieves 1.4% performance improvement over the model without incorporating the word representation vector obtained from the word embeddings network, and as a result, the POS tagging accuracy is 96.1%, which is superior to all other compared models including the BiLSTM+CRF model.
Building similarity graph...
Analyzing shared references across papers
Loading...
Pong-Gol You
Chun-Sik So
Song-Min Choe
American Journal of Neural Networks and Applications
Korean Academy of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
You et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69e1cf985cdc762e9d8588e0 — DOI: https://doi.org/10.11648/j.ajnna.20261201.12