The integration of large language models (LLMs) to conversational recommender systems (CRS) represents an enormous evolution in their strategic potential. Yet to date, research has predominantly focused upon technical frameworks to implement LLM-driven CRS, at the expense of end-user evaluations or strategic implications for firms, particularly from the perspective of a small to medium enterprises (SME) that makeup the bedrock of the global economy. In the current paper, we detail the design and field performance of an LLM-driven CRS in a small to medium enterprise (SME) context using both system metrics and end-user evaluations, while also presenting a revised ResQue model for evaluating LLM-driven CRS, enabling replicability in a rapidly evolving field. Results demonstrate satisfactory system performance (85.5% perceived recommendation accuracy) but underscore latency, cost, and quality challenges. Notably, with median costs of ∃0.04 per interaction and latency of 5.7s, cost-effectiveness and response time emerge as crucial issues, predominantly driven by use of ChatGPT as a ranker within the retrieval-augmented generation (RAG) technique. Results also show that relying solely on prompt-based learning has quality limitations in a production environment. Strategic considerations for SMEs are outlined considering trade-offs in the technical landscape.
Kunstmann et al. (Tue,) studied this question.