This is a research paper that investigate the effect of changing the context size in a 2.8 B paramter model on the answer accuracy. This paper can contribute toward making AI models more effiecient, leading to faster, more accurate as well as making the electricity consumption lesser. Large Language Models (LLMs) have demonstrated strong performance across a variety of natural language processing tasks. However, smaller language models with limited parameter counts often struggle with factual question answering due to restricted parametric knowledge and reduced reasoning capacity. Retrieval-Augmented Generation (RAG) has emerged as a promising technique for improving factual accuracy by supplying external retrieved information during inference. This study investigates the impact of Retrieval-Augmented Generation and varying context window sizes on the question-answering performance of a small language model. Using the Phi-2 language model with 2.7 billion parameters, experiments were conducted on the TriviaQA dataset under both RAG and non-RAG prompting conditions. Context windows ranging from 128 to 512 tokens were evaluated using BM25 retrieval and Exact Match (EM) as the primary evaluation metric. Results demonstrated that RAG significantly improved answer accuracy compared to the baseline no-RAG condition. However, increasing the context window size beyond smaller token limits produced minimal performance gains, suggesting saturation effects in context utilization. The findings indicate that while retrieval augmentation substantially benefits small language models, simply increasing the amount of retrieved context may not proportionally improve performance. This suggests that smaller models may face limitations in effectively processing large retrieved contexts despite access to additional information.
Dev Rajesh (Sun,) studied this question.