December 7, 2017

News Classification from Social Media Using Twitter-based Doc2Vec Model and Automatic Query Expansion

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

News classification is among essential needs for people to organize, better understand, and utilize information from the Internet. This motivates the authors to propose a novel method to classify news from social media. First, we propose to vectorize an article with TD2V, our pre-trained Twitter-based universal document representation following Doc2Vec approach. We then define Modified Distance to better measure the semantic distance between two document vectors. Finally, we apply retrieval and automatic query expansion to get the most relevant labeled documents in a corpus to determine the category for a new article. As our TD2V is created from 297 million words in 420,351 news articles from more than one million tweets in Twitters from 2010 to 2017, it can be used as one of the efficient pre-trained models for English document representation in various applications. Experiments on datasets from different online sources show that our method achieves the classification accuracy better than existing methods, specifically 98.4±0.3% (BBC dataset), 98.9±0.7% (BBC Sport dataset), 94.1±0.2% (Amazon4 dataset), and 78.6% (20NewsGroup dataset). Furthermore, in the classification training process, we just encode all articles in the training set with TD2V, not to train a dedicated classification model for each of these datasets.

Me gusta

Guardar

Cite This Study

Trieu et al. (Thu,) studied this question.

synapsesocial.com/papers/6a20b8b69e00afa23b234b4b https://doi.org/https://doi.org/10.1145/3155133.3155206

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Me gusta

Guardar