March 26, 2014

BTM: Topic Modeling over Short Texts

Key Points

Key points are not available for this paper at this time.

Abstract

Short texts are popular on today's web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM). BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xueqi Cheng

Chinese Academy of Sciences

Xiaohui Yan

China Three Gorges University

Yanyan Lan

Fujian University of Traditional Chinese Medicine

Journals

IEEE Transactions on Knowledge and Data Engineering

Actions

Institutions

Chinese Academy of Sciences

Institute of Computing Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

BTM: Topic Modeling over Short Texts

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study