September 29, 2025Open Access

The Application of Vector Space Models in Intelligent Information Retrieval Systems

Key Points

KazBERT with morphological analysis outperformed multilingual BERT and TF-IDF in retrieval tasks.
Model effectiveness was measured using precision, recall, and F1-score across 24,000 Kazakh text samples.
The study developed a hybrid model tailored to the unique grammatical nature of the Kazakh language.
Introducing a custom morpho-syntactic metric significantly improved retrieval relevance.

Abstract

The relevance of this research is driven by the growing need to improve the efficiency of semantic information retrieval amid the rapid expansion of text data, particularly in low-resource languages such as Kazakh. The purpose of the research is to develop a justified approach for selecting and comparing text vectorization models used in intelligent search systems, considering the morphological and syntactic features of the Kazakh language, and to construct a mathematical model for computing semantic similarity in a multidimensional vector space. The methodology is based on the empirical testing of six models (TF-IDF, Word2Vec, FastText, GloVe, BERT, and KazBERT) on a corpus of 24,000 Kazakh texts. Vectorization was performed using CLS-tokens; morphological preprocessing employed the Kaznlp tool. Model effectiveness was assessed using precision, recall, and F1-score metrics. The results demonstrated that KazBERT, combined with morphological analysis, achieved the highest accuracy in handling variable word forms, outperforming multilingual BERT by 11–15% and TF-IDF by over 30%. FastText showed strong resilience to morphological variation but was less effective with syntactically complex queries. The scientific novelty lies in the development of a hybrid model for intelligent search adapted to the agglutinative nature of the Kazakh language, and in the introduction of a custom morpho-syntactic metric that increases sensitivity to grammatical features. The conclusions confirm that adapting vector models to account for grammar significantly enhances retrieval relevance.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper

Cite This Study

Sadykova et al. (Sat,) studied this question.

synapsesocial.com/papers/68da58c9c1728099cfd10a54 https://doi.org/https://doi.org/10.32014/2025.2518-1726.370

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Demander à l'IA

Bookmark

View Full Paper