What question did this study set out to answer?

The research aims to create an effective machine learning model for identifying Turkish dialects from audio data collected across Türkiye.

April 10, 2026Open Access

Classification of Seven Different Dialects Spoken in Seven Geographical Regions of Türkiye Using Machine Learning Models

Puntos clave

The research aims to create an effective machine learning model for identifying Turkish dialects from audio data collected across Türkiye.
Audio recordings of regional dialects were collected from YouTube, totaling 15,889 seconds.
Features were extracted from the audio using Mel-frequency cepstral coefficients (MFCCs).
Eleven different classification methods were employed to build the machine learning models.
The classification model was optimized for improved accuracy.
The overall classification accuracy for the dialects reached 89%.
F1-scores for the dialects ranged from 80.1% to 96.6%.
The Eastern Anatolia Region showed the highest correct prediction rate of 96.6%.

Resumen

In this study, a machine learning model was developed to predict which of the seven Turkish dialects a given speech recording, collected from seven different regions of Türkiye, belongs to. The datasets used for machine learning were gathered from YouTube, prioritizing sound recordings with high potential to reflect regional dialects, focusing on natural conversations and local people’s speech. A total of 15,889 s of audio was collected, ensuring a balanced representation of each regional dialect. The features of the audio recordings, segmented into specific sizes, were extracted using MFCCs. Machine learning models were then constructed with these extracted features using 11 classification methods. With the performance enhancements obtained through optimization, the classification success for each regional dialect reached an accuracy rate of 89%, while the correct prediction rates for each of the seven regions had F1-scores ranging from 80.1% to 96.6%. The results of the analysis indicate that audio recordings from the Eastern Anatolia Region were correctly predicted at a high rate of 96.6%. This study aimed to develop a machine learning model that achieves performance improvements in identifying and predicting which regional dialects audio recordings, comprising speeches with local and regional characteristic traces in Türkiye, belong to.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Bayram et al. (Tue,) studied this question.

synapsesocial.com/papers/69d894ec6c1944d70ce05e1c https://doi.org/https://doi.org/10.26650/acin.1769461

Me gusta

Guardar

Ver artículo completo