June 29, 2018Open Access

Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters

OOOlutobi Owoputi BOBrendan O’ConnorUniversity of Massachusetts Amherst CDChris DyerDefence Science and Technology Laboratory

Key Points

Key points are not available for this paper at this time.

Abstract

We consider the problem of part-of-speech tagging for informal, online conversational text. We systematically evaluate the use of large-scale unsupervised word clustering and new lexical features to improve tagging accuracy. With these features, our system achieves state-of-the-art tagging results on both Twitter and IRC POS tagging tasks; Twitter tagging is improved from 90% to 93% accuracy (more than 3% absolute). Qualitative analysis of these word clusters yields insights about NLP and linguistic phenomena in this genre. Additionally, we contribute the first POS annotation guidelines for such text and release a new dataset of English language tweets annotated using these guidelines. Tagging software, annotation guidelines, and large-scale word clusters are available at: http://www.ark.cs.cmu.edu/TweetNLP This paper describes release 0.3 of the “CMU Twitter Part-of-Speech Tagger” and annotated data.

AI에게 질문

Bookmark

View Full Paper

Cite This Study

Owoputi et al. (Fri,) studied this question.

synapsesocial.com/papers/6a0a1c19a9b588564434c483 https://doi.org/https://doi.org/10.1184/r1/6473408

AI에게 질문

Bookmark

View Full Paper