March 3, 2026Open Access

Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre

Puntos clave

Random Forest algorithm achieved the highest accuracy rates in classifying regional and age-based datasets, reaching F-measures up to 0.91.
The system utilizes stylometric indicators selected through natural language processing, evaluating on six distinct datasets for comprehensive insights.
Experimental results showed 73% accuracy in regional classification and 55% in genre classification, demonstrating the importance of stylometric features.
Combining statistical learning with stylometric analysis paves the way for employing advanced techniques like deep learning in future research.

Resumen

Authorship identification seeks to determine the writer of a text by analyzing distinctive linguistic and stylistic features. These characteristics may vary across dimensions such as region, age, and genre. Identifying an author’s stylistic fingerprint is essential in plagiarism detection, digital forensics, and computational linguistics. In this study, the authorship features of Turkish columnists were analyzed using Artificial Neural Networks (ANN), Support Vector Machines (SVM), and decision tree algorithms (J48 and Random Forest). Sixteen stylometric indicators were selected through the Zemberek natural language processing library and evaluated across six distinct datasets. The proposed system allows flexible parameter adjustment through a graphical interface and exports results in ARFF format for reproducibility. Experimental results demonstrated that Random Forest achieved the highest overall accuracy, particularly in regional and age-based datasets, with F-measures reaching up to 0.91. The accuracy rates were 73% for regional classification, 55% for genre classification, and 62.5% for age-based classification. The findings confirm that combining statistical learning with stylometric analysis provides a robust framework for Turkish authorship attribution, paving the way for future studies employing deep learning and transformer-based models.

Me gusta

Guardar

Ver artículo completo

Cite This Study

Levent et al. (Wed,) studied this question.

synapsesocial.com/papers/69a75cc6c6e9836116a25efb https://doi.org/https://doi.org/10.29130/dubited.1728460

Me gusta

Guardar

Ver artículo completo