Los puntos clave no están disponibles para este artículo en este momento.
Numerous methods based on the content based filtering is available for email spam identification. Dimensionality of the feature space is recognized as one of the leading factors that affect the efficiency in classifying mails. This study identifies feature selection techniques used in the general text classification for spam filtering. Also, the classification and prediction is performed using different entities of email such as header, body and subject. We present a comparative study of different feature selection methods. Through extensive experiments we demonstrated that Weighted Mutual Information feature selection with header and body of the emails is efficient in email classification.
Thomas et al. (Mon,) studied this question.