November 1, 2016

A machine learning approach for authorship attribution for Bengali blogs

Key Points

Key points are not available for this paper at this time.

Abstract

In this paper we described an authorship attribution system for Bengali blog texts. We have presented a new Bengali blog corpus of 3000 passages written by three authors. Our study proposes a text classification system, based on lexical features such as character bigrams and trigrams, word n-grams (n = 1, 2, 3) and stop words, using four classifiers. We achieve best results (more than 99%) on the held-out dataset using Multi layered Perceptrons (MLP) amongst the four classifiers, which indicates MLP can produce very good results for big data sets and lexical n-gram based features can be the best features for any authorship attribution system.

Bookmark

Cite This Study

Phani et al. (Tue,) studied this question.

synapsesocial.com/papers/6a20d92f10699ec7be2aa329 https://doi.org/https://doi.org/10.1109/ialp.2016.7875984

Also Consider

Synapse has enriched 3 closely related papers on similar clinical questions. Consider them for comparative context:

Also Consider

Synapse has enriched 3 closely related papers on similar clinical questions. Consider them for comparative context: