What question did this study set out to answer?

This benchmark aims to evaluate the effectiveness of morphology in retrieving Kazakh language content compared to multilingual embeddings.

June 11, 2026Open Access

Morphology Beats Multilingual Embeddings for Kazakh Retrieval: A 300-Query Benchmark with Honest Negative Results

Key Points

This benchmark aims to evaluate the effectiveness of morphology in retrieving Kazakh language content compared to multilingual embeddings.
Utilized a benchmark of 300 queries across 8,370 Wikipedia passages from three categories: inflected, natural, and vocabulary-gap.
Compared five retrieval systems: BM25 with and without a Kazakh stemmer, and three zero-shot dense models (LaBSE, Granite-278M, E5-base).
Employed paired bootstrap analysis to assess performance improvement.
BM25 with Kazakh stemmer improved retrieval by 16% nDCG@10 on inflected queries (p=0.0017) and 9% overall (p=0.0001).
BM25 with a stemmer outperformed zero-shot LaBSE, achieving 0.754 compared to 0.481.
Reported negative results included failures in synonym query expansion, RRF hybrid fusion, and no significant gains in retrieval accuracy on Qwen2.5-7B.

Abstract

We present a reproducible retrieval benchmark for Kazakh — an agglutinative, low-resource language — comprising 300 queries over 8,370 Wikipedia passages across three query categories: inflected, natural, and vocabulary-gap. We evaluate five retrieval systems: BM25 with and without a Kazakh stemmer, and three zero-shot dense models (LaBSE, Granite-278M, E5-base). The Kazakh stemmer significantly improves BM25 retrieval (+16% nDCG@10 on inflected queries, p=0.0017; +9% overall, p=0.0001, n=300, paired bootstrap), and outperforms zero-shot LaBSE (0.754 vs 0.481). We also report three honest negative results: synonym query expansion hurts all categories, RRF hybrid fusion fails its pre-registered criteria, and better retrieval does not yield a significant end-to-end RAG accuracy gain on Qwen2.5-7B. All code, data, and results are publicly available at https://github.com/Tim2190/Kaz-RAG-search-benchmark.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper