Key points are not available for this paper at this time.
We consider the problem of part-of-speech tagging for informal, online conversational text. We systematically evaluate the use of large-scale unsupervised word clustering and new lexical features to improve tagging accuracy. With these features, our system achieves state-of-the-art tagging results on both Twitter and IRC POS tagging tasks; Twitter tagging is improved from 90% to 93% accuracy (more than 3% absolute). Qualitative analysis of these word clusters yields insights about NLP and linguistic phenomena in this genre. Additionally, we contribute the first POS annotation guidelines for such text and release a new dataset of English language tweets annotated using these guidelines. Tagging software, annotation guidelines, and large-scale word clusters are available at: http://www.ark.cs.cmu.edu/TweetNLP This paper describes release 0.3 of the “CMU Twitter Part-of-Speech Tagger” and annotated data.
Building similarity graph...
Analyzing shared references across papers
Loading...
Olutobi Owoputi
Brendan O’Connor
Chris Dyer
Carnegie Mellon University
Toyota Technological Institute at Chicago
Building similarity graph...
Analyzing shared references across papers
Loading...
Owoputi et al. (Fri,) studied this question.
www.synapsesocial.com/papers/6a0a1c19a9b588564434c483 — DOI: https://doi.org/10.1184/r1/6473408