What question did this study set out to answer?

The aim is to evaluate the performance and user experience of a conversational recommender system using large language models in SMEs.

March 26, 2026

EventChat: Implementation and user-centric evaluation of a large language model-driven conversational recommender system for exploring leisure events in an SME context

Key Points

The aim is to evaluate the performance and user experience of a conversational recommender system using large language models in SMEs.
Designed and field-tested a conversational recommender system in an SME context.
Collected system metrics and end-user evaluations to assess performance.
Introduced a revised ResQue model for evaluating LLM-driven systems.
Achieved a perceived recommendation accuracy of 85.5%.
Identified latency of 5.7 seconds and median costs of ∃0.04 per interaction as significant challenges.
Noted quality limitations when relying solely on prompt-based learning in production.

Abstract

The integration of large language models (LLMs) to conversational recommender systems (CRS) represents an enormous evolution in their strategic potential. Yet to date, research has predominantly focused upon technical frameworks to implement LLM-driven CRS, at the expense of end-user evaluations or strategic implications for firms, particularly from the perspective of a small to medium enterprises (SME) that makeup the bedrock of the global economy. In the current paper, we detail the design and field performance of an LLM-driven CRS in a small to medium enterprise (SME) context using both system metrics and end-user evaluations, while also presenting a revised ResQue model for evaluating LLM-driven CRS, enabling replicability in a rapidly evolving field. Results demonstrate satisfactory system performance (85.5% perceived recommendation accuracy) but underscore latency, cost, and quality challenges. Notably, with median costs of ∃0.04 per interaction and latency of 5.7s, cost-effectiveness and response time emerge as crucial issues, predominantly driven by use of ChatGPT as a ranker within the retrieval-augmented generation (RAG) technique. Results also show that relying solely on prompt-based learning has quality limitations in a production environment. Strategic considerations for SMEs are outlined considering trade-offs in the technical landscape.

Mark Helpful

Bookmark

Relay