What question did this study set out to answer?

The aim is to create and assess a chatbot that enhances access to academic information for students.

June 20, 2026Open Access

An LLaMA 3.1-Based Chatbot with Retrieval-Augmented Generation (RAG) for Academic Services at UPN “Veteran” Yogyakarta

Key Points

The aim is to create and assess a chatbot that enhances access to academic information for students.
Developed a chatbot using LLaMA 3.1 and RAG framework
Processed 263 document chunks to form a knowledge base
Compared the hybrid system with lexical-only and semantic-only approaches using various metrics.
Achieved answer faithfulness of 0.712 and context recall of 0.895, improving faithfulness by about 29.5% and 32.8% respectively compared to baselines.
Recorded a Token F1 Score of 0.499 and a BLEU score of 0.233 with an average response time of 7.64 seconds.
User evaluation yielded a high satisfaction rating of 4.46 out of 5.00.

Abstract

While universities heavily rely on digital information systems, static websites and manual administrative communication often limit accessibility and responsiveness for students seeking academic information. To address this, this study developed and evaluated an academic chatbot using the LLaMA 3.1 large language model integrated with a Retrieval-Augmented Generation (RAG) framework for Informatics students at Universitas Pembangunan Nasional “Veteran” Yogyakarta. Employing a Rapid Application Development approach, 263 institutional document chunks were processed to construct a knowledge base for a hybrid retrieval pipeline that combines BM25 lexical search and semantic vector similarity. The proposed system was comprehensively benchmarked against standalone lexical-only and semantic-only baselines using both RAG-specific and natural language generation (NLG) metrics. Experimental results demonstrated that the hybrid strategy achieved the highest answer faithfulness (0.712) and context recall (0.895), representing a 29.5% and 32.8% improvement in faithfulness over the respective standalone baselines, thereby ensuring superior factual consistency. Furthermore, the hybrid system recorded a Token F1 Score of 0.499, a BLEU score of 0.233, and a faster average response time of 7.64 seconds due to parallel query execution and context-size optimization. Finally, exploratory user evaluation yielded high satisfaction with an overall score of 4.46 out of 5.00, confirming its viability for real-world academic assistance.

Bookmark

View Full Paper

Cite This Study

Prayanto et al. (Wed,) studied this question.

synapsesocial.com/papers/6a3632fcdb0793dc1a539647 https://doi.org/https://doi.org/10.58920/dsc0201633

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper