Key points are not available for this paper at this time.
Short texts are popular on today's web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM). BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xueqi Cheng
Chinese Academy of Sciences
Xiaohui Yan
China Three Gorges University
Yanyan Lan
Fujian University of Traditional Chinese Medicine
IEEE Transactions on Knowledge and Data Engineering
Chinese Academy of Sciences
Institute of Computing Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Cheng et al. (Wed,) studied this question.
synapsesocial.com/papers/69d81ff0617ce96c42ae309c — DOI: https://doi.org/10.1109/tkde.2014.2313872