What does this research mean for the field?

TF-IDF yields slightly better classification performance than BM25 for term weighting and feature extraction on Twitter data, achieving a higher maximum F1-measure. Novelty: ClaimNovelty.INCREMENTAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

April 1, 2019

Term Weighting for Feature Extraction on Twitter: A Comparison Between BM25 and TF-IDF

Key Points

Key points are not available for this paper at this time.

Abstract

Feature extraction is to transform a text document from any format into a list of features that can be easily processed by text classification techniques. Feature extraction is one of significant preprocessing techniques in data mining and text classification that computes features value in documents. Hence, efficient feature extraction techniques like the BM25 and term frequency-inverse document frequency (TF-IDF) techniques are normally utilized in term weighting. Nevertheless, BM25 is not a single function that is utilized to exceedingly correct very long documents. This problem cannot denote the helpfulness or importance of confident features, and decreases the efficiency of classification. This paper presents a comparative study of feature extraction techniques. Two techniques were evaluated BM25 and TF-IDF to weight the terms on Twitter. In this paper, TF-IDF feature extraction technique is presented to compare between the two techniques. The experiments show that TF-IDF improves the performance evaluation of feature extraction according to the maximum value of F1-measure is 89.77 for TF-IDF and 89.16 for BM25.

KI fragen

Bookmark