Los puntos clave no están disponibles para este artículo en este momento.
Data mining is the primary method of gathering large volumes of knowledge. To make use of such data to implementation requires the use of effective machine learning strategies. Semantic Textual Similarity is one of the primary machine learning strategies. At its core, semantic textual similarity is the identification of words with similar context. Initial work in STS involved text reuse, word search among others. The proposed research work uses a specific method of determining textual similarity using Google's Word2Vec framework and the Continuous-bag-of-words algorithm for identifying word similarity in rap records. A large data set consisting of over 50,000 rap records is used. The key aspect of proposed methodology is to determine the words with similar context and cluster them into different word clusters also called bags. To achieve the desired result, the dataset is first processed to obtain the features. Once the features are selected, a model is generated by passing the data onto the Word2Vec framework. The research work on semantic textual similarity was repeated across three different runs, with the data set size changing in every run. At the end of each the accuracy of similarity obtained by the model was determined. The current research work has achieved average accuracy as 85%.
Chandra et al. (Sat,) studied this question.