What question did this study set out to answer?

This research examines how changing context window sizes affects question-answering performance in small language models using retrieval-augmented generation.

June 23, 2026Open Access

Investigating the Effect of Context Window Size on Retrieval-Augmented Generation in Small Language Models

Key Points

This research examines how changing context window sizes affects question-answering performance in small language models using retrieval-augmented generation.
Conducted experiments on the Phi-2 language model with 2.7 billion parameters.
Evaluated context windows from 128 to 512 tokens using BM25 retrieval.
Measured performance using Exact Match as the primary evaluation metric.
RAG improved answer accuracy significantly compared to the no-RAG baseline.
Increasing context window size beyond smaller limits produced minimal additional gains.
Findings suggest smaller models may struggle to utilize larger retrieved contexts effectively.

Abstract

This is a research paper that investigate the effect of changing the context size in a 2.8 B paramter model on the answer accuracy. This paper can contribute toward making AI models more effiecient, leading to faster, more accurate as well as making the electricity consumption lesser. Large Language Models (LLMs) have demonstrated strong performance across a variety of natural language processing tasks. However, smaller language models with limited parameter counts often struggle with factual question answering due to restricted parametric knowledge and reduced reasoning capacity. Retrieval-Augmented Generation (RAG) has emerged as a promising technique for improving factual accuracy by supplying external retrieved information during inference. This study investigates the impact of Retrieval-Augmented Generation and varying context window sizes on the question-answering performance of a small language model. Using the Phi-2 language model with 2.7 billion parameters, experiments were conducted on the TriviaQA dataset under both RAG and non-RAG prompting conditions. Context windows ranging from 128 to 512 tokens were evaluated using BM25 retrieval and Exact Match (EM) as the primary evaluation metric. Results demonstrated that RAG significantly improved answer accuracy compared to the baseline no-RAG condition. However, increasing the context window size beyond smaller token limits produced minimal performance gains, suggesting saturation effects in context utilization. The findings indicate that while retrieval augmentation substantially benefits small language models, simply increasing the amount of retrieved context may not proportionally improve performance. This suggests that smaller models may face limitations in effectively processing large retrieved contexts despite access to additional information.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper