Key points are not available for this paper at this time.
This paper describes the application of Distributional Clustering 20 to document classification. This approach clusters words into groups based on the distribution of class labels associated with each word. Thus, unlike some other unsupervised dimensionalityreduction techniques, such as Latent Semantic Indexing, we are able to compress the feature space much more aggressively, while still maintaining high document classification accuracy. Experimental results obtained on three real-world data sets show that we can reduce the feature dimensionality by three orders of magnitude and lose only 2% accuracy---significantly better than Latent Semantic Indexing 6, class-based clustering 1, feature selection by mutual information 23, or Markov-blanket-based feature selection 13. We also show that less aggressive clustering sometimes results in improved classification accuracy over classification without clustering. 1 Introduction The popularity of the Internet has caused an exponent...
Building similarity graph...
Analyzing shared references across papers
Loading...
Baker et al. (Sat,) studied this question.
www.synapsesocial.com/papers/6a10442ad8db7e4a41fa8746 — DOI: https://doi.org/10.1145/290941.290970
Lee D. Baker
Andrew Kachites McCallum
Carnegie Mellon University
Building similarity graph...
Analyzing shared references across papers
Loading...