In this study, a machine learning model was developed to predict which of the seven Turkish dialects a given speech recording, collected from seven different regions of Türkiye, belongs to. The datasets used for machine learning were gathered from YouTube, prioritizing sound recordings with high potential to reflect regional dialects, focusing on natural conversations and local people’s speech. A total of 15,889 s of audio was collected, ensuring a balanced representation of each regional dialect. The features of the audio recordings, segmented into specific sizes, were extracted using MFCCs. Machine learning models were then constructed with these extracted features using 11 classification methods. With the performance enhancements obtained through optimization, the classification success for each regional dialect reached an accuracy rate of 89%, while the correct prediction rates for each of the seven regions had F1-scores ranging from 80.1% to 96.6%. The results of the analysis indicate that audio recordings from the Eastern Anatolia Region were correctly predicted at a high rate of 96.6%. This study aimed to develop a machine learning model that achieves performance improvements in identifying and predicting which regional dialects audio recordings, comprising speeches with local and regional characteristic traces in Türkiye, belong to.
Bayram et al. (Tue,) studied this question.